Doctoral Degrees (Electrical and Electronic Engineering)
Permanent URI for this collection
Browse
Browsing Doctoral Degrees (Electrical and Electronic Engineering) by Subject "Acoustic models"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemLanguage modelling for code-switched automatic speech recognition in five South African languages(Stellenbosch : Stellenbosch University, 2018-12) Van der Westhuizen, Ewald; Niesler, T. R.; Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.ENGLISH ABSTRACT: Code-switching refers to natural, spontaneous language alternation by multilingual speakers during a conversation or utterance, and is prevalent in everyday conversations by multilingual South Africans. Automatic speech recognition systems are generally highly optimised for monolingual input and performance deteriorates when presented with mixed-language speech. This thesis addresses the automatic recognition of speech containing code-switching between English and four South African Bantu languages, focussing specifically on the language modelling of English-isiZulu, English-isiXhosa, English- Setswana and English-Sesotho. Due to the severe scarcity of code-switched speech data in South African languages, it was necessary to first develop a representative corpus. This new and unique 35-hour corpus contains segmented and transcribed code-switched speech from conversations in South African soap operas, which exhibit spontaneous utterances with regular code-switching in the target languages. Insertional, alternational, and intraword intrasentential code-switching are all represented in the data, as are some other special characteristics of fast, spontaneous Bantu speech such as postlexical deletion. The distribution of language switches is extremely sparse, however. In this thesis, a number of data-driven modelling approaches were investigated and applied to address the sparsity by augmenting the training data with synthetically generated data. Postlexical deletion was successfully modelled statistically with joint-sequence models, and these models were used to generate synthetic pronunciations which were demonstrated to lead to improved automatic speech recognition performance. Two new code-switched language modelling approaches were proposed to address data sparsity. First, parallel language-dependent language modelling (PLDLM), which consists of two monolingual language models with explicit language transitions, was demonstrated to outperform a conventional language-independent language model in terms of recognition word error rate. Second, language models in which word embeddings were used to synthesise probable unseen code-switched bigrams were considered. It was possible to achieve a reduction of up to 31% in language model perplexity across a language switch boundary by including such synthesised code-switch bigrams. Although smaller, improvements in the recognition word error rate were also observed.