Phonene-based topic spotting on the switchboard corpus

Date
2002-04
Authors
Theunissen, M. W. (Marthinus Wilhelmus)
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: The field of topic spotting in conversational speech deals with the problem of identifying "interesting" conversations or speech extracts contained within large volumes of speech data. Typical applications where the technology can be found include the surveillance and screening of messages before referring to human operators. Closely related methods can also be used for data-mining of multimedia databases, literature searches, language identification, call routing and message prioritisation. The first topic spotting systems used words as the most basic units. However, because of the poor performance of speech recognisers, a large amount of topic-specific hand-transcribed training data is needed. It is for this reason that researchers started concentrating on methods using phonemes instead, because the errors then occur on smaller, and therefore less important, units. Phoneme-based methods consequently make it feasible to use computer generated transcriptions as training data. Building on word-based methods, a number of phoneme-based systems have emerged. The two most promising ones are the Euclidean Nearest Wrong Neighbours (ENWN) algorithm and the newly developed Stochastic Method for the Automatic Recognition of Topics (SMART). Previous experiments on the Oregon Graduate Institute of Science and Technology's Multi-Language Telephone Speech Corpus suggested that SMART yields a large improvement over ENWN which outperformed competing phoneme-based systems in evaluations. However, the small amount of data available for these experiments meant that more rigorous testing was required. In this research, the algorithms were therefore re-implemented to run on the much larger Switchboard Corpus. Subsequently, a substantial improvement of SMART over ENWN was observed, confirming the result that was previously obtained. In addition to this, an investigation was conducted into the improvement of SMART. This resulted in a new counting strategy with a corresponding improvement in performance.
AFRIKAANSE OPSOMMING: Die veld van onderwerp-herkenning in spraak het te doen met die probleem om "interessante" gesprekke of spraaksegmente te identifiseer tussen groot hoeveelhede spraakdata. Die tegnologie word tipies gebruik om gesprekke te verwerk voor dit verwys word na menslike operateurs. Verwante metodes kan ook gebruik word vir die ontginning van data in multimedia databasisse, literatuur-soektogte, taal-herkenning, oproep-kanalisering en boodskap-prioritisering. Die eerste onderwerp-herkenners was woordgebaseerd, maar as gevolg van die swak resultate wat behaal word met spraak-herkenners, is groot hoeveelhede hand-getranskribeerde data nodig om sulke stelsels af te rig. Dit is om hierdie rede dat navorsers tans foneemgebaseerde benaderings verkies, aangesien die foute op kleiner, en dus minder belangrike, eenhede voorkom. Foneemgebaseerde metodes maak dit dus moontlik om rekenaargegenereerde transkripsies as afrigdata te gebruik. Verskeie foneemgebaseerde stelsels het verskyn deur voort te bou op woordgebaseerde metodes. Die twee belowendste stelsels is die "Euclidean Nearest Wrong Neighbours" (ENWN) algoritme en die nuwe "Stochastic Method for the Automatic Recognition of Topics" (SMART). Vorige eksperimente op die "Oregon Graduate Institute of Science and Technology's Multi-Language Telephone Speech Corpus" het daarop gedui dat die SMART algoritme beter vaar as die ENWN-stelsel wat ander foneemgebaseerde algoritmes geklop het. Die feit dat daar te min data beskikbaar was tydens die eksperimente het daarop gedui dat strenger toetse nodig was. Gedurende hierdie navorsing is die algoritmes dus herimplementeer sodat eksperimente op die "Switchboard Corpus" uitgevoer kon word. Daar is vervolgens waargeneem dat SMART aansienlik beter resultate lewer as ENWN en dit het dus die geldigheid van die vorige resultate bevestig. Ter aanvulling hiervan, is 'n ondersoek geloods om SMART te probeer verbeter. Dit het tot 'n nuwe telling-strategie gelei met 'n meegaande verbetering in resultate.
Description
Thesis (MScEng)--Stellenbosch University, 2002.
Keywords
Automatic speech recognition, Speech processing systems, Pattern recognition systems, Dissertations -- Electronic engineering, Euclidean Nearest Wrong Neighbours (ENWN), Stochastic Method for the Automatic Recognition of Topics (SMART), Theses -- Electronic engineering
Citation