Classification in high dimensional data using sparse techniques

Stulumani, Agrippa (2019-04)

Thesis (MCom)--Stellenbosch University, 2019.

Thesis

ENGLISH SUMMARY : Traditional classification techniques fail in the analysis of high-dimensional data. In response, new classification techniques and accompanying theory have recently emerged. These techniques are natural extensions of linear discriminant analysis. The aim is to solve the statistical challenges that arise with high-dimensional data by utilising the sparse coding (Johnstone and Titterington, 2009). In this project, our focus is on the following techniques: penalized LDA-FL, penalized LDA-FL, sparse discriminant analysis, sparse mixture discriminant analysis and sparse partial least squares. We evaluated the performance of these techniques in simulation studies and on two microarray gene expression datasets by comparing the test error rates and the number of features selected. In the simulation studies, we found that performance vary depending on the simulation set-up and on the classification technique used. The two microarray gene expression datasets are considered for practical implementation of these techniques. The results from the microarray gene expression datasets showed that these classification techniques achieve satisfactory accuracy.

AFRIKAANSE OPSOMMING : Geen opsomming beskikbaar.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/105792
This item appears in the following collections: