Aspects of multi-class nearest hypersphere classification
Thesis (MCom)--Stellenbosch University, 2017.
ENGLISH SUMMARY : Using hyperspheres in the analysis of multivariate data is not a common practice in Statistics. However, hyperspheres have some interesting properties which are useful for data analysis in the following areas: domain description (finding a support region), detecting outliers (novelty detection) and the classification of objects into known classes. This thesis demonstrates how a hypersphere is fitted around a single dataset to obtain a support region and an outlier detector. The all-enclosing and 𝜐-soft hyperspheres are derived. The hyperspheres are then extended to multi-class classification, which is called nearest hypersphere classification (NHC). Different aspects of multi-class NHC are investigated. To study the classification performance of NHC we compared it to three other classification techniques. These techniques are support vector machine classification, random forests and penalised linear discriminant analysis. Using NHC requires choosing a kernel function and in this thesis, the Gaussian kernel will be used. NHC also depends on selecting an appropriate kernel hyper-parameter 𝛾 and a tuning parameter 𝐶. The behaviour of the error rate and the fraction of support vectors for different values of 𝛾 and 𝐶 will be investigated. Two methods will be investigated to obtain the optimal 𝛾 value for NHC. The first method uses a differential evolution procedure to find this value. The R function DEoptim() is used to execute this. The second method uses the R function sigest(). The first method is dependent on the classification technique and the second method is executed independently of the classification technique.
AFRIKAANSE OPSOMMING : Geen opsomming beskikbaar.