Human and automatic accent identification of Nguni and Sotho Black South African English
It is well established that accent can have a detrimental effect on the performance of automatic speech recognition (ASR) systems. Whereas accents can be labelled in terms of a speaker's mother tongue, it remains to be determined if and when this distinction is appropriate for the development of ASR technology. This study compares the varieties of South African English produced by mother-tongue speakers of the Nguni and Sotho languages, who account for over 70% of the country's population. The aim of the investigation was to determine whether these two accent groups should be treated as a single variety by ASR systems, or whether it is better to consider them separately. To this end, two sets of experiments were carried out. First, a perceptual experiment was performed in which human listeners were required to classify different English accents. Subsequently, automatic speech recognition experiments were conducted to determine how the accuracy of an automatic accent identification system compares with these perceptual results, and whether the acoustic models benefit from the incorporation of Nguni/Sotho accent classifications. The results of the perceptual experiment indicated that most listeners could not correctly identify a speaker's mother tongue based on their English accent. This finding was supported by the results of the automatic accent identification and speech recognition experiments.