Measuring, refining and calibrating speaker and language information extracted from speech

dc.contributor.advisorDu Preez, J. A.en_ZA
dc.contributor.authorBrummer, Nikoen_ZA
dc.contributor.otherUniversity of Stellenbosch. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.
dc.date.accessioned2010-11-15T14:41:37Zen_ZA
dc.date.accessioned2010-12-15T10:14:24Z
dc.date.available2010-11-15T14:41:37Zen_ZA
dc.date.available2010-12-15T10:14:24Z
dc.date.issued2010-12en_ZA
dc.descriptionThesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2010.en_ZA
dc.description.abstractENGLISH ABSTRACT: We propose a new methodology, based on proper scoring rules, for the evaluation of the goodness of pattern recognizers with probabilistic outputs. The recognizers of interest take an input, known to belong to one of a discrete set of classes, and output a calibrated likelihood for each class. This is a generalization of the traditional use of proper scoring rules to evaluate the goodness of probability distributions. A recognizer with outputs in well-calibrated probability distribution form can be applied to make cost-effective Bayes decisions over a range of applications, having di fferent cost functions. A recognizer with likelihood output can additionally be employed for a wide range of prior distributions for the to-be-recognized classes. We use automatic speaker recognition and automatic spoken language recognition as prototypes of this type of pattern recognizer. The traditional evaluation methods in these fields, as represented by the series of NIST Speaker and Language Recognition Evaluations, evaluate hard decisions made by the recognizers. This makes these recognizers cost-and-prior-dependent. The proposed methodology generalizes that of the NIST evaluations, allowing for the evaluation of recognizers which are intended to be usefully applied over a wide range of applications, having variable priors and costs. The proposal includes a family of evaluation criteria, where each member of the family is formed by a proper scoring rule. We emphasize two members of this family: (i) A non-strict scoring rule, directly representing error-rate at a given prior. (ii) The strict logarithmic scoring rule which represents information content, or which equivalently represents summarized error-rate, or expected cost, over a wide range of applications. We further show how to form a family of secondary evaluation criteria, which by contrasting with the primary criteria, form an analysis of the goodness of calibration of the recognizers likelihoods. Finally, we show how to use the logarithmic scoring rule as an objective function for the discriminative training of fusion and calibration of speaker and language recognizers.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Ons wys hoe om die onsekerheid in die uittree van outomatiese sprekerherkenning- en taalherkenningstelsels voor te stel, te meet, te kalibreer en te optimeer. Dit maak die bestaande tegnologie akkurater, doeltre ender en meer algemeen toepasbaar.af
dc.format.extent160 p. : ill.
dc.identifier.urihttp://hdl.handle.net/10019.1/5139
dc.publisherStellenbosch : University of Stellenbosch
dc.rights.holderUniversity of Stellenbosch
dc.subjectAutomatic speaker recognitionen_ZA
dc.subjectAutomatic spoken language recognitionen_ZA
dc.subjectProper scoring ruleen_ZA
dc.subjectCalibrationen_ZA
dc.subjectDissertations -- Electronic engineeringen
dc.subjectTheses -- Electronic engineeringen
dc.subjectAutomatic speech recognitionen
dc.subjectSpeech processing systemsen
dc.titleMeasuring, refining and calibrating speaker and language information extracted from speechen_ZA
dc.typeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
brummer_measuring_2010.pdf
Size:
1.35 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.99 KB
Format:
Plain Text
Description: