PCA and CVA biplots : a study of their underlying theory and quality measures

Brand, Hilmarie (2013-03)

Thesis (MComm)--Stellenbosch University, 2013.

Thesis

ENGLISH ABSTRACT: The main topics of study in this thesis are the Principal Component Analysis (PCA) and Canonical Variate Analysis (CVA) biplots, with the primary focus falling on the quality measures associated with these biplots. A detailed study of different routes along which PCA and CVA can be derived precedes the study of the PCA biplot and CVA biplot respectively. Different perspectives on PCA and CVA highlight different aspects of the theory that underlie PCA and CVA biplots respectively and so contribute to a more solid understanding of these biplots and their interpretation. PCA is studied via the routes followed by Pearson (1901) and Hotelling (1933). CVA is studied from the perspectives of Linear Discriminant Analysis, Canonical Correlation Analysis as well as a two-step approach introduced in Gower et al. (2011). The close relationship between CVA and Multivariate Analysis of Variance (MANOVA) also receives some attention. An explanation of the construction of the PCA biplot is provided subsequent to the study of PCA. Thereafter follows an in depth investigation of quality measures of the PCA biplot as well as the relationships between these quality measures. Specific attention is given to the effect of standardisation on the PCA biplot and its quality measures. Following the study of CVA is an explanation of the construction of the weighted CVA biplot as well as two different unweighted CVA biplots based on the two-step approach to CVA. Specific attention is given to the effect of accounting for group sizes in the construction of the CVA biplot on the representation of the group structure underlying a data set. It was found that larger groups tend to be better separated from other groups in the weighted CVA biplot than in the corresponding unweighted CVA biplots. Similarly it was found that smaller groups tend to be separated to a greater extent from other groups in the unweighted CVA biplots than in the corresponding weighted CVA biplot. A detailed investigation of previously defined quality measures of the CVA biplot follows the study of the CVA biplot. It was found that the accuracy with which the group centroids of larger groups are approximated in the weighted CVA biplot is usually higher than that in the corresponding unweighted CVA biplots. Three new quality measures that assess that accuracy of the Pythagorean distances in the CVA biplot are also defined. These quality measures assess the accuracy of the Pythagorean distances between the group centroids, the Pythagorean distances between the individual samples and the Pythagorean distances between the individual samples and group centroids in the CVA biplot respectively.

AFRIKAANSE OPSOMMING: Die hoofonderwerpe van studie in hierdie tesis is die Hoofkomponent Analise (HKA) bistipping asook die Kanoniese Veranderlike Analise (KVA) bistipping met die primêre fokus op die kwaliteitsmaatstawwe wat daarmee geassosieer word. ’n Gedetailleerde studie van verskillende roetes waarlangs HKA en KVA afgelei kan word, gaan die studie van die HKA en KVA bistippings respektiewelik vooraf. Verskillende perspektiewe op HKA en KVA belig verskillende aspekte van die teorie wat onderliggend is tot die HKA en KVA bistippings respektiewelik en dra sodoende by tot ’n meer breedvoerige begrip van hierdie bistippings en hulle interpretasies. HKA word bestudeer volgens die roetes wat gevolg is deur Pearson (1901) en Hotelling (1933). KVA word bestudeer vanuit die perspektiewe van Linieêre Diskriminantanalise, Kanoniese Korrelasie-analise sowel as ’n twee-stap-benadering soos voorgestel in Gower et al. (2011). Die noue verwantskap tussen KVA en Meerveranderlike Analise van Variansie (MANOVA) kry ook aandag. ’n Verduideliking van die konstruksie van die HKA bistipping word voorsien na afloop van die studie van HKA. Daarna volg ’n indiepte-ondersoek van die HKA bistipping kwaliteitsmaatstawwe sowel as die onderlinge verhoudings tussen hierdie kwaliteitsmaatstawe. Spesifieke aandag word gegee aan die effek van die standaardisasie op die HKA bistipping en sy kwaliteitsmaatstawe. Opvolgend op die studie van KVA is ’n verduideliking van die konstruksie van die geweegde KVA bistipping sowel as twee veskillende ongeweegde KVA bistippings gebaseer op die twee-stap-benadering tot KVA. Spesifieke aandag word gegee aan die effek wat die inagneming van die groepsgroottes in die konstruksie van die KVA bistipping op die voorstelling van die groepstruktuur onderliggend aan ’n datastel het. Daar is gevind dat groter groepe beter geskei is van ander groepe in die geweegde KVA bistipping as in die oorstemmende ongeweegde KVA bistipping. Soortgelyk daaraan is gevind dat kleiner groepe tot ’n groter mate geskei is van ander groepe in die ongeweegde KVA bistipping as in die oorstemmende geweegde KVA bistipping. ’n Gedetailleerde ondersoek van voorheen gedefinieerde kwaliteitsmaatstawe van die KVA bistipping volg op die studie van die KVA bistipping. Daar is gevind dat die akkuraatheid waarmee die groepsgemiddeldes van groter groepe benader word in die geweegde KVA bistipping, gewoonlik hoër is as in die ooreenstemmende ongeweegde KVA bistippings. Drie nuwe kwaliteitsmaatstawe wat die akkuraatheid van die Pythagoras-afstande in die KVA bistipping meet, word gedefinieer. Hierdie kwaliteitsmaatstawe beskryf onderskeidelik die akkuraatheid van die voorstelling van die Pythagoras-afstande tussen die groepsgemiddeldes, die Pythagoras-afstande tussen die individuele observasies en die Pythagoras-afstande tussen die individuele observasies en groepsgemiddeldes in die KVA bistipping.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/80363
This item appears in the following collections: