Large-Scale clustering of acoustic segments for sub-word acoustic modelling

Lerato, Lerato

Large-Scale clustering of acoustic segments for sub-word acoustic modelling

dc.contributor.advisor	Niesler, T. R.	en_ZA
dc.contributor.author	Lerato, Lerato	en_ZA
dc.contributor.other	Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.	en_ZA
dc.date.accessioned	2019-02-01T07:48:22Z
dc.date.accessioned	2019-04-17T08:11:43Z
dc.date.available	2019-02-01T07:48:22Z
dc.date.available	2019-04-17T08:11:43Z
dc.date.issued	2019-04
dc.description	Thesis (PhD)--Stellenbosch University, 2019.	en_ZA
dc.description.abstract	ENGLISH ABSTRACT: A pronunciation dictionary is one of the key building blocks in automatic speech recognition (ASR) systems. However, pronunciation dictionaries used in state-of-the-art ASR systems are hand-crafted by linguists. This process requires expertise, time and funding and as a consequence is not realised for many under-resourced languages. To address this, we develop a new unsupervised agglomerative hierarchical clustering (AHC) algorithm that can be used to discover sub-word units that can in turn be used for the automatic induction of a pronunciation dictionary. The new algorithm, named multi-stage agglomerative hierarchical clustering (MAHC), addresses the O(N2) memory and computation complexity observed when classical AHC is applied to large datasets. MAHC splits the data into independent subsets and applies AHC to each. The resultant clusters are merged, re-divided into subsets, and passed to a following iteration. Results show that MAHC can match and even surpass the performance of classical AHC. Furthermore, MAHC can automatically determine the optimal number of clusters which is a feature not offered by most other approaches. A further refinement of MAHC, termed MAHC with memory size management (MAHC+M), addresses the case where some subsets may exhibit excessive growth during iterative clustering. MAHC+M is able to adhere to maximum memory constraints, which improves efficiency and is practically useful when using parallel computing resources. The input to MAHC is a matrix of pairwise distances computed with dynamic time warping (DTW). A modified form of DTW, named feature trajectory DTW (FTDTW), is introduced and shown to generally lead to better performance for both MAHC and MAHC+M. It is shown that clusters obtained using the MAHC algorithm can be used as sub-word units (SWUs) for acoustic modelling. Pronunciations in terms of these SWUs were obtained by alignment with the orthography. Speech recognition experiments show that dictionaries induced using clusters obtained by FTDTW-based MAHC+M consistently outperform those obtained using DTW-based MAHC.	en_ZA
dc.format.extent	125 pages	en_ZA
dc.identifier.uri	http://hdl.handle.net/10019.1/105757
dc.language.iso	en_ZA	en_ZA
dc.publisher	Stellenbosch : Stellenbosch University	en_ZA
dc.rights.holder	Stellenbosch University	en_ZA
dc.subject	Large-Scale Clustering; Acoustic Segments; Sub-word; Acoustic Modelling	en_ZA
dc.subject	Automatic speech recognition	en_ZA
dc.subject	Agglomerations	en_ZA
dc.subject	Acoustical engineering	en_ZA
dc.subject	UCTD	en_ZA
dc.title	Large-Scale clustering of acoustic segments for sub-word acoustic modelling	en_ZA
dc.type	Thesis	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: lerato_scale_2019.pdf
Size:: 5.38 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Doctoral Degrees (Electrical and Electronic Engineering)