Application of cluster analysis and multidimensional scaling on medical schemes data
Thesis (MComm (Statistics and Actuarial Science))--Stellenbosch University, 2008.
Cluster analysis and multidimensional scaling (MDS) methods can be used to explore the structure in multidimensional data and can be applied to various fields of study. In this study, clustering techniques and MDS methods are applied to a data set from the health insurance field. This data set contains information of the number of medical scheme beneficiaries, between ages 55 to 59, that are treated for certain combinations of chronic diseases. Clustering techniques and MDS methods will be used to describe the interrelations among these chronic diseases and to determine certain clusters of chronic diseases. Similarity or dissimilarity measures between the chronic diseases are constructed before the application of MDS methods or clustering techniques, because the chronic diseases are binary variables in the data set. The calculation of dissimilarities between the chronic diseases is based on various dissimilarity coefficients, where a different dissimilarity coefficient will produce a different set of dissimilarities. One of the aims of this study is to compare different dissimilarity coefficients and it will be shown that the Jaccard, Ochiai, Baroni-Urbani-Buser, Phi and Yule dissimilarity coefficients are most suitable for use on this particular data set. MDS methods are used to produce a lower dimensional display space where the chronic diseases are represented by points and distances between these points give some measurement of similarity between the chronic diseases. The classical scaling, metric least squares scaling and nonmetric MDS methods are used in this study and it will be shown that the nonmetric MDS method is the most suitable MDS method to use for this particular data set. The Scaling by Majorizing a Complicated Function (SMACOF) algorithm is used to minimise the loss functions in this study and it was found to perform well. Clustering techniques are used to provide information about the clustering structure of the chronic diseases. Chronic diseases that are in the same cluster can be considered to be more similar, while chronic diseases in different clusters are more dissimilar. The robust clustering techniques: PAM, FANNY, AGNES and DIANA are applied to the data set. It was found that AGNES and DIANA performed very well on the data set, while PAM and FANNY performed only marginally well.