Validation of independent components using a hypothesis testing approach

De Koker, Corine (2020-12)

Thesis (MCom)--Stellenbosch University, 2020.

Thesis

ENGLISH ABSTRACT: The main focus of this thesis is the validation of Independent Component Analysis (ICA), a popular technique used in signal processing. In a typical application, the purpose of ICA is to extract non-Gaussian signals representing the source signals from observed signals that are mixtures of the source signals in the case where the source signals are unavailable or unknown. This thesis only considers the FastICA implementation of ICA in the case where the number of source signals are equal to the number of mixture signals, and where any additive noise can be neglected. The FastICA algorithm extracts non-Gaussian signals through the maxmisation of negentropy. The more non-Gaussian the source signals, the more closely the signals extracted using FastICA represent the source signals. Amongst other things, this thesis demonstrates a novel approach using hypothesis testing with negentropy as a test statistic to determine the degree of non-Gaussianity of the source signals. The results from the hypothesis test mentioned previously were compared to the results from a second hypothesis test which uses a measure suggested by Himberg et al. (2004) that measures the compactness of the clusters of estimates of ICA components. The clustering visualisation methods proposed by Himberg et al. (2004) were also executed in this thesis and provided visual support for the results from the hypothesis tests. Both hypothesis tests were performed on three different datasets. The first dataset contained mixtures of only non-Gaussian signals. The second dataset contained mixtures of three non-Gaussian and three Gaussian signals, while the third dataset contained mixtures of only Gaussian signals. Both hypothesis tests rejected the null hypothesis that each of the source signals contained in the dataset are Gaussian when applied to the first dataset, which is in line with our expectations. The results from both hypothesis tests indicated the presence of three Gaussian and three non-Gaussian source signals in the second dataset. Regarding the third dataset, both hypothesis tests rejected about 5% of the signals extracted by the FastICA algorithm, which was as expected since a significance level of 5% was used. Therefore, our results provide evidence that hypothesis testing could potentially be used as an alternative method to indicate the degree of non-Gaussianity of mixtures of source signals. Key words: ICA; Hypothesis testing; non-Gaussianity

AFRIKAANSE OPSOMMING: Die fokus van hierdie tesis is die validering van Onafhanklike Komponent Analise (OKA), `n gewilde tegniek in seinprossesering. Die doel van OKA is om nie-Gaussiese seine wat die oorspronklike seine verteenwoordig te beraam wanneer net mengsels van die oorspronklike seine beskikbaar is. Hierdie tesis oorweeg net die FastICA implementasie van OKA in die geval waar die aantal oorspronklike seine gelyk is aan die aantal mengsel seine, en waar additiewe ruis nagelaat kan word. FastICA beraam nie-Gaussiese seine deur die maksimalisering van negentropie. Hoe meer nie-Gaussies die oorspronklike seine, hoe nader verteenwoordig die beramings van die FastICA algoritme die oorspronklike seine. Onder andere het hierdie tesis `n nuwe benadering gedemonstreer deur gebruik te maak van hipotese toetsing met negentropie as `n toetsstatistiek om die graad van nie-Gaussianiteit van die oorspronklike seine te bepaal. Die resultate van die voorgenoemde hipotese toets is vergelyk met die resultate van `n tweede hipotese toets wat gebruik maak van `n mate voorgestel deur Himberg et al. (2004) wat die kompakheid van groeperings van beramings van OKA komponente meet. Die groeperings-visualiseringsmetodes voorgestel deur Himberg et al. (2004) was ook uitgevoer in hierdie tesis en verskaf visuele ondersteuning vir die resultate van die hipotese toetse. Beide hipotese toetse is uitgevoer op drie verskillende datastelle. Die eerste datastel is saamgestel uit vermengings van slegs nie-Gaussiese seine. Die tweede datastel het bestaan uit vermengings van drie nie-Gaussiese en drie Gaussiese seine, terwyl die derde datastel slegs uit vermengings van Gaussiese seine bestaan het. Beide hipotese toetse het die nulhipotese - dat elke sein in die datastel Gaussies is - verwerp vir al die seine toe die algoritme toegepas was op die eerste datastel, wat volgens ons verwagtings is. Die resultate van beide hipotese toetse het nagenoeg drie Gaussiese en drie nie-Gaussiese seine aangedui in die tweede datastel. Aangaande die derde datastel het beide hipotese toetse 5% van die seine verwerp. Dit stem ooreen met wat verwag is, aangesien `n vertrouevlak van 5% gebruik was. Die gevolgtrekking is dus dat hipotese toetsing die potensiaal het om gebruik te kan word as `n alternatiewe metode om die graad van nie-Gaussianiteit van oorspronklike seine te bepaal, wat kan voorspel hoe akkuraat die beraamde seine ooreenstem met die oorspronklike seine. Sleutelwoorde: Onafhanklike Komponent Analise, Hipotese toetsing, nie-Gaussianiteit.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/109149
This item appears in the following collections: