Preconditioning for feature selection in classification

Pretorius, Jani

Preconditioning for feature selection in classification

dc.contributor.advisor	Steel, S. J.	en_ZA
dc.contributor.author	Pretorius, Jani	en_ZA
dc.contributor.other	Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.	en_ZA
dc.date.accessioned	2019-02-25T16:54:29Z
dc.date.accessioned	2019-04-17T08:26:10Z
dc.date.available	2019-02-25T16:54:29Z
dc.date.available	2019-04-17T08:26:10Z
dc.date.issued	2019-04
dc.description	Thesis (MCom)--Stellenbosch University, 2019.	en_ZA
dc.description.abstract	ENGLISH SUMMARY : Increased dimensionality of data is a clear trend that has been observed over the past few decades. However, analysing high-dimensional data in order to predict an outcome can be problematic. In certain cases, such as when analysing genomic data, a predictive model that is both interpretable and accurate is required. Many techniques focus on solving these two components simultaneously; however, when the data are high-dimensional and noisy, such an approach may perform poorly. Preconditioning is a two-stage technique that aims to reduce the noise inherent in the training data before making final predictions. In doing so, it addresses the issues of interpretability and accuracy separately. The literature on this technique focuses on the regression case, but in this thesis, the technique is applied in a classification setting. An overview of the theory surrounding this method is provided, as well as an empirical analysis of the method. A simulation study evaluates the performance of the technique under various scenarios and compare the results to those obtained by standard (non-preconditioned) models. Thereafter, the models are applied to real-world datasets and their performances compared. Based on the results of the empirical work, it appears that, at their best, preconditioned classifiers can only reach a performance that is on par with standard classifiers. This is in contrast to the regression case, where the literature has shown that preconditioning can outperform standard regression models in high-dimensional settings.	en_ZA
dc.description.abstract	AFRIKAANSE OPSOMMING : ’n Toename in die dimensionaliteit van datasetelle is ’n duidelike tendens wat oor die afgelope paar dekades na voorskyn gekom het. Om hoër-dimensionele data te analiseer sodat ’n uitkoms voorspel kan word, kan problematies wees. In sekere gevalle, soos wanneer genetiese data geanaliseer word, word ’n voorspellende model wat beide interpreteerbaar, sowel as akkuraat is, verlang. Baie tegnieke fokus daarop om hierdie twee aspekte gelyktydig op te los, maar wanneer die data van ’n hoë dimensie is en geruis bevat, kan hierdie benadering swak resultate oplewer. Prekondisionering is ’n twee-fase prosess wat daarop gemik is om die geruis in die afrigdatastel te verminder voordat ’n finale voorspelling gemaak word. Sodoende spreek dit die kwessies van interpreteerbaarheid en akkuraatheid afsonderlik aan. In die literatuur word daar klem gelê op die regressie geval. In hierdie tesis word die tegniek egter toegepas in ’n klassifikasie konteks. ’n Oorsig van die teorie aangaande hierdie metode word verskaf, sowel as empiriese studies. Simulasie studies evalueer die prestasie van die tegniek onder verskeie omstandighede en vergelyk die uitkomste met dié wat deur standaard (nie-geprekondisioneerde) modelle behaal was. Daarna word die modelle toegepas op regte-wêreld datastelle en hul resultate vergelyk. Gebaseer op die resultate van die empiriese werk wil dit blyk asof geprekondisioneerde klassifikasiemodelle, op hul beste, slegs so goed as standaard klassifikasiemodelle kan presteer. Hierdie bevindinge staan in kontras met die regressie geval, waar die literatuur wys dat prekondisionering standaard regressiemodelle kan uitpresteer in hoë dimensionele gevalle.	af_ZA
dc.format.extent	xvi, 128 pages ; illustrations, includes annexures
dc.identifier.uri	http://hdl.handle.net/10019.1/106057
dc.language.iso	en_ZA	en_ZA
dc.publisher	Stellenbosch : Stellenbosch University
dc.rights.holder	Stellenbosch University
dc.subject	High-dimensional data -- Statistical methods	en_ZA
dc.subject	Preconditioning	en_ZA
dc.subject	Statistical learning theory	en_ZA
dc.subject	Supervised learning (Machine learning)	en_ZA
dc.subject	Dimension reduction (Statistics)	en_ZA
dc.subject	Variables (Mathematics) -- Statistical methods	en_ZA
dc.subject	Predictive modeling	en_ZA
dc.subject	Discriminant analysis	en_ZA
dc.subject	UCTD
dc.title	Preconditioning for feature selection in classification	en_ZA
dc.type	Thesis	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: pretorius_preconditioning_2019.pdf
Size:: 2 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Masters Degrees (Statistics and Actuarial Science)