Multiple outlier detection and cluster analysis of multivariate normal data

dc.contributor.advisorHerbst, B. M.en_ZA
dc.contributor.advisorMuller, N. L.en_ZA
dc.contributor.authorRobson, Geoffreyen_ZA
dc.contributor.otherStellenbosch University. Faculty of Science. Dept. of Mathematical Sciences.en_ZA
dc.date.accessioned2012-08-27T11:35:30Z
dc.date.available2012-08-27T11:35:30Z
dc.date.issued2003-12
dc.descriptionThesis (MscEng)--Stellenbosch University, 2003.en_ZA
dc.description.abstractENGLISH ABSTRACT: Outliers may be defined as observations that are sufficiently aberrant to arouse the suspicion of the analyst as to their origin. They could be the result of human error, in which case they should be corrected, but they may also be an interesting exception, and this would deserve further investigation. Identification of outliers typically consists of an informal inspection of a plot of the data, but this is unreliable for dimensions greater than two. A formal procedure for detecting outliers allows for consistency when classifying observations. It also enables one to automate the detection of outliers by using computers. The special case of univariate data is treated separately to introduce essential concepts, and also because it may well be of interest in its own right. We then consider techniques used for detecting multiple outliers in a multivariate normal sample, and go on to explain how these may be generalized to include cluster analysis. Multivariate outlier detection is based on the Minimum Covariance Determinant (MCD) subset, and is therefore treated in detail. Exact bivariate algorithms were refined and implemented, and the solutions were used to establish the performance of the commonly used heuristic, Fast–MCD.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Uitskieters word gedefinieer as waarnemings wat tot s´o ’n mate afwyk van die verwagte gedrag dat die analis wantrouig is oor die oorsprong daarvan. Hierdie waarnemings mag die resultaat wees van menslike foute, in welke geval dit reggestel moet word. Dit mag egter ook ’n interressante verskynsel wees wat verdere ondersoek benodig. Die identifikasie van uitskieters word tipies informeel deur inspeksie vanaf ’n grafiese voorstelling van die data uitgevoer, maar hierdie benadering is onbetroubaar vir dimensies groter as twee. ’n Formele prosedure vir die bepaling van uitskieters sal meer konsekwente klassifisering van steekproefdata tot gevolg hˆe. Dit gee ook geleentheid vir effektiewe rekenaar implementering van die tegnieke. Aanvanklik word die spesiale geval van eenveranderlike data behandel om noodsaaklike begrippe bekend te stel, maar ook aangesien dit in eie reg ’n area van groot belang is. Verder word tegnieke vir die identifikasie van verskeie uitskieters in meerveranderlike, normaal verspreide data beskou. Daar word ook ondersoek hoe hierdie idees veralgemeen kan word om tros analise in te sluit. Die sogenaamde Minimum Covariance Determinant (MCD) subversameling is fundamenteel vir die identifikasie van meerveranderlike uitskieters, en word daarom in detail ondersoek. Deterministiese tweeveranderlike algoritmes is verfyn en ge¨ımplementeer, en gebruik om die effektiwiteit van die algemeen gebruikte heuristiese algoritme, Fast–MCD, te ondersoek.af_ZA
dc.format.extent127 p. : ill.
dc.identifier.urihttp://hdl.handle.net/10019.1/53508
dc.language.isoen_ZA
dc.publisherStellenbosch : Stellenbosch University
dc.rights.holderStellenbosch University
dc.subjectMultivariate analysisen_ZA
dc.subjectOutliers (Statistics)en_ZA
dc.subjectData editingen_ZA
dc.subjectMinimum Covariance Determinant (MCD)en_ZA
dc.subjectDissertations -- Applied mathematicsen_ZA
dc.subjectTheses -- Applied mathematicsen_ZA
dc.subjectDissertations -- Mathematical sciencesen_ZA
dc.subjectTheses -- Mathematical sciencesen_ZA
dc.titleMultiple outlier detection and cluster analysis of multivariate normal dataen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
robson_multiple_2003.pdf
Size:
836.81 KB
Format:
Adobe Portable Document Format
Description: