Cluster analysis and classification of process data by use of principal curves

Van Coller, Cornelia Susanna

Cluster analysis and classification of process data by use of principal curves

Files

vancoller_cluster_1999.pdf(18.51 MB)

Date

1999-12

Authors

Van Coller, Cornelia Susanna

Publisher

Stellenbosch : Stellenbosch University

Abstract

ENGLISH SUMMARY: In this thesis a new method of clustering as wen as a new method of classification is proposed. Cluster analysis is a statistical method used to search for natural groups in an unstructured multivariate data set. Clusters are obtained in such. a way that the observations belonging to the same group are more alike than observations across groups. For instance, long data records are found in mineral processing plants, where the data can be reduced to clusters according to different ore types. Most of the existing clustering methods do not give reliable results when applied to engineering data, since these methods were mainly developed in the domains of psychology and biology. Classification analysis can be regarded as the natural continuation of cluster analysis. In order to classify objects, two types of observations are needed. The first are those observations whose group memberships are known a priori, which can be acquired through cluster analysis. The second kind of observations are those whose group memberships are unidentified. By means of classification these observations are allocated to one of the existing groups. Both of the proposed techniques are based on the use of a smooth one-dimensional curve, passing through the middle of the data set. To formalise such an idea, principal curves were developed by Hastie and Stuetzle (1989). A principal curve summarises the data in a non-linear fashion. For clustering, the principal curve of the entire unstructured data set is extracted. This one-dimensional representation of the data set is then used to search for different clusters. For classification, a principal curve is fitted to every known group in the data set. The observations to be assigned to one of the known groups are allocated to the group closest to the new point. Clustering with principal curves grouped engineering data better than most of the well-known clustering algorithms. Some shortcomings of this method were also established. Classification with principal curves gave similar, optimal results as compared to some existing classification methods. This classification method can be applied to data of any distribution, unlike statistical classification techniques.
AFRIKAANSE OPSOMMING: In hierdie tesis word 'n nuwe metode elk vir trosanalise en klassifikasie analise voorgestel. Trosanalise is 'n statistiese tegniek waarrnee natuurlike groepe in 'n ongestruktureerde meerveranderlike datastel gevind word. Groepe word op so 'n wyse verkry dat die waamemings in dieselfde groep meer eenders is as waarnemings tussen groepe. Byvoorbeeld, in mineraalaanlegte is lang datarekords algemeen, wat deur middel van trosanalise gereduseer kan word na verskillende groepe, ooreenkomstig verskillende ertstipes. Die meerderheid bestaande groeperingsmetodes lewer nie betroubare resultate in hul toepassing op ingenieursdata nie, aangesien hierdie tegnieke meestal hul oorsprong in die sielkundige en biologiese velde het. Klassifikasie analise kan gesien word as die natuurlike opvolging van trosanalise. Om objekte te klassifiseer, word gebruik gemaak van twee soorte waarnemings. Die eerste tipe is daardie waamemings met a priori bekende groepsidentiteite, wat deur trosanalise gevind kan word. Die tweede soort is die waarnemings met onbekende groepsidentiteite. Elkeen van hierdie waarnemings kan deur middel van klassifikasie toegewys word aan een van die bestaande groepe. Beide hierdie voorgestelde tegnieke is gebaseer op die gebruik van 'n gladde, eendimensionele kromme wat deur die middel van die datastel beweeg. Om hierdie idee te formaliseer, is hoojkrommes ontwikkel deur Hastie en Stuetzle (1989). 'n Hoofkromme gee 'n nie-lineere opsomming van die data. Vir groeperingsdoeleindes word 'n hoofkromme uit die algehele ongestruktureerde datastel onttrek. Met klassifikasie word'n hootkurwe aan elke bekende groep in die datastel gepas. Die waameming wat aan een van die bestaande groepe toegewys moet word, word in die groep naaste aan die betrokke punt geplaas. Groepering met behulp van hoofkrommes, het met ingenieursdata beter resultate gelewer as meeste van die bestaande tegnieke. Deur middel van praktiese voorbeelde is sekere tekortkominge van hierdie groeperingsmetode vasgestel. Klassifikasie met behulp van hoofkrornmes lewer soortgelyke, optimale resultate as die van bekende vergelykende tegnieke. Die voorgestelde klassifikasie tegniek kan toegepas word op datastelle van enige verde ling, in teenstelling met die statistiese klassifikasietegnieke.

Description

Thesis (M.Ing.) -- University of Stellenbosch, 1999.

Keywords

Cluster analysis, Ore-dressing, Chemical plants, Metallurgical plants, Dissertations -- Chemical engineering

URI

http://hdl.handle.net/10019.1/51176

Collections

Masters Degrees (Chemical Engineering)

Full item page