Automatic discovery of subword units and pronunciations for automatic speech recognition using TIMIT

Date
2010-11
Authors
Goussard, George
Niesler, Thomas
Journal Title
Journal ISSN
Volume Title
Publisher
PRASA
Abstract
We address the automatic generation of acoustic subword units and an associated pronunciation dictionary for speech recognition. The speech audio is first segmented into phoneme-like units by detecting points at which the spectral characteristics of the signal change abruptly. These audio segments are subsequently subjected to agglomerative clustering in order to group similar acoustic segments. Finally, the orthography is iteratively aligned with the resulting transcription in terms of audio clusters in order to determine pronunciations of the training words. The approach is evaluated by applying it to two subsets of the TIMIT corpus, both of which have a closed vocabulary. It is found that, when vocabulary words occur often in the training set, the proposed technique delivers performance that is close to but lower than a system based on the TIMIT phonetic transcriptions. When vocabulary words are not repeated often in the training set, the best system is able to outperform its counterpart based on the TIMIT phonetic transcriptions, although recognition performance in both cases is poor.
Description
Both authors from Stellenbosch University.
Proceedings of the twenty-first annual symposium of the Pattern Recognition Association of South Africa (PRASA), Stellenbosch, South Africa, November 2010.
Keywords
Automatic subword unit discovery, Automatic speech recognition, TIMIT
Citation
Goussard, GW & Niesler, TR 2010. Automatic discovery of subword units and pronunciations for automatic speech recognition using TIMIT. Proceedings of the twenty-first annual symposium of the Pattern Recognition Association of South Africa (PRASA), Stellenbosch, South Africa, November 2010.