Doctoral Degrees (Statistics and Actuarial Science)
Permanent URI for this collection
Browse
Browsing Doctoral Degrees (Statistics and Actuarial Science) by Author "Gardner, Sugnet"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemExtensions of biplot methodology to discriminant analysis with applications of non-parametric principal components(Stellenbosch : Stellenbosch University, 2001) Gardner, Sugnet; Le Roux, N. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistical and Actuarial Science.ENGLISH ABSTRACT: Gower and Hand offer a new perspective on the traditional biplot. This perspective provides a unified approach to principal component analysis (PCA) biplots based on Pythagorean distance; canonical variate analysis (CVA) biplots based on Mahalanobis distance; non-linear biplots based on Euclidean embeddable distances as well as generalised biplots for use with both continuous and categorical variables. The biplot methodology of Gower and Hand is extended and applied in statistical discrimination and classification. This leads to discriminant analysis by means of PCA biplots, CVA biplots, non-linear biplots as well as generalised biplots. Properties of these techniques are derived in detail. Classification regions defined for linear discriminant analysis (LDA) are applied in the CVA biplot leading to discriminant analysis using biplot methodology. Situations where the assumptions of LDA are not met are considered and various existing alternative discriminant analysis procedures are formulated in terms of biplots and apart from PCA biplots, QDA, FDA and DSM biplots are defined, constructed and their usage illustrated. It is demonstrated that biplot methodology naturally provides for managing categorical and continuous variables simultaneously. It is shown through a simulation study that the techniques based on biplot methodology can be applied successfully to the reversal problem with categorical variables in discriminant analysis. Situations occurring in practice where existing discriminant analysis procedures based on distances from means fail are considered. After discussing self-consistency and principal curves (a form of non-parametric principal components), discriminant analysis based on distances from principal curves (a form of a conditional mean) are proposed. This biplot classification procedure based upon principal curves, yields much better results. Bootstrapping is considered as a means of describing variability in biplots. Variability in samples as well as of axes in biplot displays receives attention. Bootstrap a-regions are defined and the ability of these regions to describe biplot variability and to detect outliers is demonstrated. Robust PCA and CVA biplots restricting the role of influential observations on biplot displays are also considered. An extensive library of S-PLUS computer programmes is provided for implementing the various discriminant analysis techniques that were developed using biplot methodology. The application of the above theoretical developments and computer software is illustrated by analysing real-life data sets. Biplots are used to investigate the degree of capital intensity of companies and to serve as an aid in risk management of a financial institution. A particular application of the PCA biplot is the TQI biplot used in industry to determine the degree to which manufactured items comply with multidimensional specifications. A further interesting application is to determine whether an Old-Cape furniture item is manufactured of stinkwood or embuia. A data set provided by the Western Cape Nature Conservation Board consisting of measurements of tortoises from the species Homopus areolatus is analysed by means of biplot methodology to determine if morphological differences exist among tortoises from different geographical regions. Allometric considerations need to be taken into account and the resulting small sample sizes in some subgroups severely limit the use of conventional statistical procedures. Biplot methodology is also applied to classification in a diabetes data set illustrating the combined advantage of using classification with principal curves in a robust biplot or biplot classification where covariance matrices are unequal. A discriminant analysis problem where foraging behaviour of deer might eventually result in a change in the dominant plant species is used to illustrate biplot classification of data sets containing both continuous and categorical variables. As an example of the use of biplots with large data sets a data set consisting of 16828 lemons is analysed using biplot methodology to investigate differences in fruit from various areas of production, cultivars and rootstocks. The proposed a-bags also provide a measure of quantifying the graphical overlap among classes. This method is successfully applied in a multidimensional socio-economical data set to quantify the degree of overlap among different race groups. The application of the proposed biplot methodology in practice has an important byproduct: It provides the impetus for many a new idea, e.g. applying a peA biplot in industry led to the development of quality regions; a-bags were constructed to represent thousands of observations in the lemons data set, in tum leading to means for quantifying the degree of overlap. This illustrates the enormous flexibility of biplots - biplot methodology provides an infrastructure for many novelties when applied in practice.