Categorical CVA biplots

dc.contributor.advisorVan der Merwe, Carel Johannesen_ZA
dc.contributor.authorRodwell, David Timothyen_ZA
dc.contributor.otherStellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.en_ZA
dc.date.accessioned2020-10-27T10:11:50Z
dc.date.accessioned2021-01-31T19:36:14Z
dc.date.available2020-10-27T10:11:50Z
dc.date.available2021-01-31T19:36:14Z
dc.date.issued2020-12
dc.descriptionThesis (MCom)--Stellenbosch University, 2020.en_ZA
dc.description.abstractENGLISH ABSTRACT: In the modern era a great amount of emphasis is placed on data visualisation, especially in cases where a large amount of data is present. Usually, in these instances, the data is of a high-dimensional nature which cannot be visualised using conventional means. Fortunately, there has been a recent surge in using biplots to visualise multivariate data, where biplots can be described as a generalisation of a scatterplot. Moreover, these biplots use dimension reduction techniques to construct a two-dimensional representation of the data with non-orthogonal axes. However, at present, an effective biplot construction technique which adequately separates classes, in cases where categorical data is present does not exist. Hence, this research builds upon an existing biplot construction technique by using elements from Canonical Variate Analysis (CVA) and non-linear Principal Component Analysis (PCA) to develop a technique that can perform class separation in cases where numerical and categorical data is present. This novel biplot construction methodology forms the crux of this research assignment. Subsequently, the feasibility of this method was explored by considering the well-known Iris data set where two variables are binned to form categorical variables. It is shown that this novel method improves upon existing biplot construction in terms of classification accuracy and class separation. However, it is noted this method can be extended by incorporating CVA in the iterative algorithm which solves the optimal categorical level scores. A web-based Shiny application was built as supplement to this paper, and can be found at https://davidrodwell:shinyapps:io/CategoricalCVABiplotApp/. Here the user can interact with the data sets, proposed methodology, and functionalities presented in this research.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: In die moderne era word daar baie klem gelê op die visualisering van data, veral in waar groot datastelle betrokke is. In hierdie gevalle is die data gewoonlik hoë-dimensioneel van aard, wat veroorsaak dat dit nie deur konvensionele maniere visueel voorgestel kan word nie. Onlangse verwikkelinge het gelei tot ’n toename in die gebruik van bi-stippings om multi-veranderlike data voor te stel, waar bi-stippings as ’n veralgemening van ’n spreidingsdiagram beskryf kan word. Hierdie bi-stippings gebruik dimensie verminderingstegnieke om ’n twee-dimensionele voorstelling van die data op ’n nie-ortogonale assestelsel te konstrueer. Huidiglik bestaan daar nie ’n effektiewe bi-stipping konstruksietegniek wat klasse kan verdeel wanneer kategoriese data teenwoordig is nie. Hierdie navorsing bou op ’n bestaande bi-stipping konstruksietegniek wat elemente van Kanoniese Veranderlike Analise (KVA) en nie-lineêre Hoof Komponent Analise (HKA) gebruik om ’n tegniek te ontwikkel wat klasse kan verdeel in gevalle waar numeriese sowel as kategoriese data teenwoordig is. Hierdie nuwe bi-stipping konstruksie metodologie vorm die kruks van hierdie navorsingstaak. Die lewensvatbaarheid van hierdie metode was ook ondersoek deur die welbekende Iris datastel te oorweeg, waar twee veranderlikes ingedeel word om kategoriese veranderlikes te word. Dit is gewys dat hierdie nuwe metode die bestaande biplot konstruksietegnieke verbeter in terme van klassifikasie akkuraatheid en klas verdeling. Daar was wel opgemerk dat hierdie metode uitgebrei kan word deur KVA in die iteratiewe algoritme te inkorporeer, wat die optimale kategoriese vlak tellings oplos. ’n Web-gebaseerde Shiny toepassing was gebou as supplimentêr tot hierdie artikel, en kan gevind word by https://davidrodwell:shinyapps:io/CategoricalCVABiplotApp/. Hier kan die gebruiker self interaksie hê met die datastelle, voorgestelde metadologie, en funksionaliteite wat voorgelê is in hierdie navorsing.af_ZA
dc.description.versionMastersen_ZA
dc.format.extentv, 30 pages : illustrationsen_ZA
dc.identifier.urihttp://hdl.handle.net/10019.1/109122
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch Universityen_ZA
dc.rights.holderStellenbosch Universityen_ZA
dc.subjectBiplotsen_ZA
dc.subjectCanonical Variate Analysis (CVA)en_ZA
dc.subjectCategorical dataen_ZA
dc.subjectCanonical correlation (Statistics)en_ZA
dc.subjectInformation visualizationen_ZA
dc.subjectMultivariate analysisen_ZA
dc.subjectUCTD
dc.titleCategorical CVA biplotsen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
rodwell_categorical_2020.pdf
Size:
1.33 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: