A parametric monophone speech synthesis system
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006.
Speech is the primary and most natural means of communication between human beings. With the rapid spread of technology across the globe and the increased number of personal and public applications for digital equipment in recent years, the need for human/machine interaction has increased dramatically. Synthetic speech is audible speech produced by a machine automatically. A text-to-speech (TTS) system is one that converts bodies of text into digital speech signals which can be heard and understood by a person. Current TTS systems generally require large annotated speech corpora in the languages for which they are developed. For many languages these resources are not available. In their absence, a TTS system generates synthetic speech by means of mathematical algorithms constrained by certain rules. This thesis describes the design and implementation of a rule-based speech generation algorithm for use in a TTS system. The system allows the type, emphasis, pitch and other parameters associated with a sound and its particular mode of articulation to be specified. However, no attempt is made to model prosodic and other higher-level information. Instead, this is assumed known. The algorithm uses linear predictive (LP) models of monophone speech units, which greatly reduces the amount of data required for development in a new language. A novel approach to the interpolation of monophone speech units is presented to allow realistic transitions between monophone units. Additionally, novel algorithms for estimation and modelling of the harmonic and stochastic content of an excitation signal are presented. This is used to determine the amount of voiced and unvoiced energy present in individual speech sounds. Promising results were obtained when evaluating the developed system’s South African English speech output using two widely used speech intelligibility tests, namely the modified rhyme test (MRT) and semantically unpredictable sentences (SUS).