Preconditioning for feature selection in classification

dc.contributor.advisorSteel, S. J.en_ZA
dc.contributor.authorPretorius, Janien_ZA
dc.contributor.otherStellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.en_ZA
dc.date.accessioned2019-02-25T16:54:29Z
dc.date.accessioned2019-04-17T08:26:10Z
dc.date.available2019-02-25T16:54:29Z
dc.date.available2019-04-17T08:26:10Z
dc.date.issued2019-04
dc.descriptionThesis (MCom)--Stellenbosch University, 2019.en_ZA
dc.description.abstractENGLISH SUMMARY : Increased dimensionality of data is a clear trend that has been observed over the past few decades. However, analysing high-dimensional data in order to predict an outcome can be problematic. In certain cases, such as when analysing genomic data, a predictive model that is both interpretable and accurate is required. Many techniques focus on solving these two components simultaneously; however, when the data are high-dimensional and noisy, such an approach may perform poorly. Preconditioning is a two-stage technique that aims to reduce the noise inherent in the training data before making final predictions. In doing so, it addresses the issues of interpretability and accuracy separately. The literature on this technique focuses on the regression case, but in this thesis, the technique is applied in a classification setting. An overview of the theory surrounding this method is provided, as well as an empirical analysis of the method. A simulation study evaluates the performance of the technique under various scenarios and compare the results to those obtained by standard (non-preconditioned) models. Thereafter, the models are applied to real-world datasets and their performances compared. Based on the results of the empirical work, it appears that, at their best, preconditioned classifiers can only reach a performance that is on par with standard classifiers. This is in contrast to the regression case, where the literature has shown that preconditioning can outperform standard regression models in high-dimensional settings.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING : ’n Toename in die dimensionaliteit van datasetelle is ’n duidelike tendens wat oor die afgelope paar dekades na voorskyn gekom het. Om hoër-dimensionele data te analiseer sodat ’n uitkoms voorspel kan word, kan problematies wees. In sekere gevalle, soos wanneer genetiese data geanaliseer word, word ’n voorspellende model wat beide interpreteerbaar, sowel as akkuraat is, verlang. Baie tegnieke fokus daarop om hierdie twee aspekte gelyktydig op te los, maar wanneer die data van ’n hoë dimensie is en geruis bevat, kan hierdie benadering swak resultate oplewer. Prekondisionering is ’n twee-fase prosess wat daarop gemik is om die geruis in die afrigdatastel te verminder voordat ’n finale voorspelling gemaak word. Sodoende spreek dit die kwessies van interpreteerbaarheid en akkuraatheid afsonderlik aan. In die literatuur word daar klem gelê op die regressie geval. In hierdie tesis word die tegniek egter toegepas in ’n klassifikasie konteks. ’n Oorsig van die teorie aangaande hierdie metode word verskaf, sowel as empiriese studies. Simulasie studies evalueer die prestasie van die tegniek onder verskeie omstandighede en vergelyk die uitkomste met dié wat deur standaard (nie-geprekondisioneerde) modelle behaal was. Daarna word die modelle toegepas op regte-wêreld datastelle en hul resultate vergelyk. Gebaseer op die resultate van die empiriese werk wil dit blyk asof geprekondisioneerde klassifikasiemodelle, op hul beste, slegs so goed as standaard klassifikasiemodelle kan presteer. Hierdie bevindinge staan in kontras met die regressie geval, waar die literatuur wys dat prekondisionering standaard regressiemodelle kan uitpresteer in hoë dimensionele gevalle.af_ZA
dc.format.extentxvi, 128 pages ; illustrations, includes annexures
dc.identifier.urihttp://hdl.handle.net/10019.1/106057
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch University
dc.rights.holderStellenbosch University
dc.subjectHigh-dimensional data -- Statistical methodsen_ZA
dc.subjectPreconditioningen_ZA
dc.subjectStatistical learning theoryen_ZA
dc.subjectSupervised learning (Machine learning)en_ZA
dc.subjectDimension reduction (Statistics)en_ZA
dc.subjectVariables (Mathematics) -- Statistical methodsen_ZA
dc.subjectPredictive modelingen_ZA
dc.subjectDiscriminant analysisen_ZA
dc.subjectUCTD
dc.titlePreconditioning for feature selection in classificationen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
pretorius_preconditioning_2019.pdf
Size:
2 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: