A comparison of Gaussian mixture variants with application to automatic phoneme recognition

dc.contributor.advisorDu Preez, J. A.
dc.contributor.authorBrand, Rinusen_ZA
dc.contributor.otherUniversity of Stellenbosch. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.
dc.date.accessioned2008-02-14T09:08:27Zen_ZA
dc.date.accessioned2010-06-01T08:51:57Z
dc.date.available2008-02-14T09:08:27Zen_ZA
dc.date.available2010-06-01T08:51:57Z
dc.date.issued2007-12
dc.descriptionThesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2007.
dc.description.abstractThe diagonal covariance Gaussian Probability Density Function (PDF) has been a very popular choice as the base PDF for Automatic Speech Recognition (ASR) systems. The only choices thus far have been between the spherical, diagonal and full covariance Gaussian PDFs. These classic methods have been used for some time, but no single document could be found that contains a comparative study on these methods in the use of Pattern Recognition (PR). There also is a gap between the complexity and speed of the diagonal and full covariance Gaussian implementations. The performance differences in accuracy, speed and size between these two methods differ drastically. There is a need to find one or more models that cover this area between these two classic methods. The objectives of this thesis are to evaluate three new PDF types that fit into the area between the diagonal and full covariance Gaussian implementations to broaden the choices for ASR, to document a comparative study on the three classic methods and the newly implemented methods (from previous work) and to construct a test system to evaluate these methods on phoneme recognition. The three classic density functions are examined and issues regarding the theory, implementation and usefulness of each are discussed. A visual example of each is given to show the impact of assumptions made by each (if any). The three newly implemented PDFs are the Sparse-, Probabilistic Principal Component Analysis- (PPCA) and Factor Analysis (FA) covariance Gaussian PDFs. The theory, implementation and practical usefulness are shown and discussed. Again visual examples are provided to show the difference in modelling methodologies. The construction of a test system using two speech corpora is shown and includes issues involving signal processing, PR and evaluation of the results. The NTIMIT and AST speech corpora were used in initialisation and training the test system. The usage of the system to evaluate the PDFs discussed in this work is explained. The testing results of the three new methods confirmed that they indeed fill the gap between the diagonal and full covariance Gaussians. In our tests the newly implemented methods produced a relative improvement in error rate over a similar implemented diagonal covariance Gaussian of 0.3–4%, but took 35–78% longer to evaluate. When compared relative to the full covariance Gaussian the error rates were 18–22% worse, but the evaluation times were 61–70% faster. When all the methods were scaled to approximately the same accuracy, all the above methods were 29–143% slower than the diagonal covariance Gaussian (excluding the spherical covariance method).en_ZA
dc.format.extent4604239 bytesen_ZA
dc.format.mimetypeapplication/pdfen_ZA
dc.identifier.urihttp://hdl.handle.net/10019.1/2549
dc.language.isoenen_ZA
dc.publisherStellenbosch : University of Stellenbosch
dc.rights.holderUniversity of Stellenbosch
dc.subjectGaussian densityen_ZA
dc.subjectPrincipal componentsen_ZA
dc.subjectSpeech processingen_ZA
dc.subjectTheses -- Electrical and electronic engineeringenZA
dc.subjectDissertations -- Electrical and electronic engineeringen_ZA
dc.subject.lcshAutomatic speech recognitionen_ZA
dc.subject.lcshFactor analysisen_ZA
dc.subject.otherElectrical and Electronic Engineeringen_ZA
dc.titleA comparison of Gaussian mixture variants with application to automatic phoneme recognitionen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
brand_comparison_2007.pdf
Size:
4.39 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.72 KB
Format:
Plain Text
Description: