Prosodic features of imperatives in Xhosa : implications for a text-to-speech system

Swart, Philippa H. (2000-03)

Thesis (MA)--University of Stellenbosch, 2000.

Thesis

ENGLISH ABSTRACT: This study focuses on the prosodic features of imperatives and the role of prosodies in the development of a text-to-speech (TIS) system for Xhosa, an African tone language. The perception of prosody is manifested in suprasegmental features such as fundamental frequency (pitch), intensity (loudness) and duration (length). Very little experimental research has been done on the prosodic features of any grammatical structures (moods and tenses) in Xhosa, therefore it has not yet been determined how and to what degree the different prosodic features are combined and utilized in the production and perception of Xhosa speech. One such grammatical structure, for which no explicit descriptive phonetic information exists, is the imperative mood expressing commands. In this study it was shown how the relationship between duration, pitch and loudness, as manifested in the production and perception of Xhosa imperatives could be determined through acoustic analyses and perceptual experiments. An experimental phonetic approach proved to be essential for the acquisition of substantial and reliable prosodic information. An extensive acoustic analysis was conducted to acquire prosodic information on the production of imperatives by Xhosa mother tongue speakers. Subsequently, various statistical parameters were calculated on the raw acoustic data (i) to establish patterns of significance and (ii) to represent the large amount of numeric data generated, in a compact manner. A perceptual experiment was conducted to investigate the perception of imperatives. The prosodic parameters that were extracted from the acoustic analysis were applied to synthesize imperatives in different contexts. A novel approach to Xhosa speech synthesis was adopted. Monotonous verbs were recorded by one speaker and the pitch and duration of these words were then manipulated with the TD-PSOLA technique. Combining the results of the acoustic analysis and the perceptual experiment made it possible to present a prosodic model for the generation of perceptually acceptable imperati ves in a practical Xhosa TIS system. Prosody generation in a natural language processing (NLP) module and its place within the larger framework of text-to-speech synthesis was discussed. It was shown that existing architectures for TTS synthesis would not be appropriate for Xhosa without some adaptation. Hence, a unique architecture was suggested and its possible application subsequently illustrated. Of particular importance was the development of an alternative algorithm for grapheme-to-phoneme conversion. Keywords: prosody, speech synthesis, speech perception, acoustic analysis, Xhosa

AFRIKAANSE OPSOMMING: Hierdie studie fokus op die prodiese eienskappe van imperatiewe en die rol van prosodie in die ontwikkeling van 'n teks-na-spraak-sisteem vir Xhosa, 'n Afrika-toontaal. Die persepsie van prosodie word gemanifesteer in suprasegmentele eienskappe soos fundamentele frekwensie (toonhoogte), intensiteit (luidheid) en duur (lengte). Weinig eksperimentele navorsing bestaan ten opsigte van die prosodiese eienskappe van enige grammatikale strukture (modus en tyd) in Xhosa. Hoe en tot watter mate die verskillende prosodiese kenmerke gekombineer en gebruik word in die produksie en persepsie van Xhosa-spraak is nog nie duidelik nie. 'n Grammatikale struktuur waarvoor geen eksplisiete deskriptiewe fonetiese inligting bestaan nie, is die van die imperatiewe modus wat bevele uitdruk. Hierdie studie wys hoe die verhouding tussen duur, toonhoogte en luidheid, soos gemanifesteer in die produksie en persepsie van Xhosa-imperatiewe bepaal kon word deur akoestiese analises en persepsueIe eksperimente. Dit het geblyk dat 'n eksperimenteelfonetiese benadering noodsaaklik is vir die verkryging van sinvolle en betroubare prosodiese inligting. 'n Uitgebreide akoestiese analise is uitgevoer om prosodiese data omtrent die produksie van imperatiewe deur Xhosa-moedertaalsprekers te bekom. Vervolgens is verskeie statistiese analises op die rou akoestiese data uitgevoer om (i) patrone van beduidenheid te bepaal en om (ii) die groot hoeveelheid numeriese data wat gegenereer is meer kompak voor te stel. 'n PersepsueIe eksperiment is uitgevoer met die doelom die persepsie van imperatiewe te ondersoek. Die prosodiese parameters soos uit die akoestiese analise bekom, is toegepas in die sintese van bevele in verskillende kontekste. 'n Nuwe benadering tot Xhosaspraaksintese is gevolg. Monotone werkwoorde is vir een spreker opgeneem en die toonhoogte en duur van hierdie woorde is met TD-PSOLA tegniek gemanipuleer. 'n Kombinasie van akoestiese en persepsueie resultate is aangewend om 'n prosodiese model te ontwikkel vir die sintese van persepsueel aanvaarbare imperatiewe in 'n praktiese Xhosa teks- na- spraaksinteti seerder . Prosodie-generering in 'n natuurlike taalprosesering-module en die plek daarvan binne die raamwerk van teks-na-spraaksintese is bespreek. Daar is gewys dat bestaande argitekture vir teks-na-spraaksisteme nie sonder sommige aanpassings toepaslik vir Xhosa sal wees nie. Derhalwe is 'n unieke argitektuur gesuggereer en die moontlike toepassing daarvan geïllustreer. Die ontwikkeling van 'n alternatiewe algoritme vir letter-na-klankomsetting was van besondere belang. Sleutelwoorde: spraaksintese, spraakpersepsie, akoestiese analise, Xhosa

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/51891
This item appears in the following collections: