Fusion of phoneme recognisers for South African English

Strydom, George Wessel (2009-03)

Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2009.

Thesis

ENGLISH ABSTRACT: Phoneme recognition systems typically suffer from low classification accuracy. Recognition for South African English is especially difficult, due to the variety of vastly different accent groups. This thesis investigates whether a fusion of classifiers, each trained on a specific accent group, can outperform a single general classifier trained on all. We implemented basic voting and score fusion techniques from which a small increase in classifier accuracy could be seen. To ensure that similarly-valued output scores from different classifiers imply the same opinion, these classifiers need to be calibrated before fusion. The main focus point of this thesis is calibration with the Pool Adjacent Violators algorithm. We achieved impressive gains in accuracy with this method and an in-depth investigation was made into the role of the prior and the connection with the proportion of target to non-target scores. Calibration and fusion using the information metric Cllr was showed to perform impressively with synthetic data, but minor increases in accuracy was found for our phoneme recognition system. The best results for this technique was achieved by calibrating each classifier individually, fusing these calibrated classifiers and then finally calibrating the fused system. Boosting and Bagging classifiers were also briefly investigated as possible phoneme recognisers. Our attempt did not achieve the target accuracy of the classifier trained on all the accent groups. The inherent difficulties typical of phoneme recognition were highlighted. Low per-class accuracies, a large number of classes and an unbalanced speech corpus all had a negative influence on the effectivity of the tested calibration and fusion techniques.

AFRIKAANSE OPSOMMING: Foneemherkenningstelsels het tipies lae klassifikasie akkuraatheid. As gevolg van die verskeidenheid verskillende aksent groepe is herkenning vir Suid-Afrikaanse Engels veral moeilik. Hierdie tesis ondersoek of ’n fusie van klassifiseerders, elk afgerig op ’n spesifieke aksent groep, beter kan doen as ’n enkele klassifiseerder wat op alle groepe afgerig is. Ons het basiese stem- en tellingfusie tegnieke ge¨ımplementeer, wat tot ’n klein verbetering in klassifiseerder akkuraatheid gelei het. Om te verseker dat soortgelyke uittreetellings van verskillende klassifiseerders dieselfde opinie impliseer, moet hierdie klassifiseerders gekalibreer word voor fusie. Die hoof fokuspunt van hierdie tesis is kalibrasie met die Pool Adja- cent Violators algoritme. Indrukwekkende toenames in akkuraatheid is behaal met hierdie metode en ’n in-diepte ondersoek is ingestel oor die rol van die aanneemlikheidswaarskynlikhede en die verwantskap met die verhouding van teiken tot nie-teiken tellings. Kalibrasie en fusie met behulp van die informasie maatstaf Cllr lewer indrukwekkende resultate met sintetiese data, maar slegs klein verbeterings in akkuraatheid is gevind vir ons foneemherkenningstelsel. Die beste resultate vir hierdie tegniek is verkry deur elke klassifiseerder afsonderlik te kalibreer, hierdie gekalibreerde klassifiseerders dan te kombineer en dan die finale gekombineerde stelsel weer te kalibreer. Boosting en Bagging klassifiseerders is ook kortliks ondersoek as moontlike foneem herkenners. Ons poging het nie die akkuraatheid van ons basislyn klassifiseerder (wat op alle data afgerig is) bereik nie. Die inherente probleme wat tipies is tot foneemherkenning is uitgewys. Lae per-klas akkuraatheid, ’n groot hoeveelheid klasse en ’n ongebalanseerde spraak korpus het almal ’n negatiewe invloed op die effektiwiteit van die getoetsde kalibrasie en fusie tegnieke gehad.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/4065
This item appears in the following collections: