Evaluating the applications of spatial audio in telephony

Blum, Konrad (2010-03)

Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2010.

Thesis

ENGLISH ABSTRACT: Telephony has developed substantially over the years, but the fundamental auditory model of mixing all the audio from di erent sources together into a single monaural stream has not changed since the telephone was rst invented. Monaural audio is very di cult to follow in a multiple-source situation such as a conference call. Sound originating from a speci c point in space will travel along a slightly di erent path to each ear. Although we are not consciously aware of it, our brain processes these spatial cues to help us to locate sounds in space. It is this spatial information that allows us to focus our attention and listen to a single speaker in an environment where many di erent sources may be active at the same time; a phenomenon known as the \cocktail party e ect". It is possible to reproduce these spatial cues in a sound recording, using Head-Related Transfer Functions (HRTFs) to allow a listener to experience localised audio, even when sound is reproduced through a headset. In this thesis, spatial audio is implemented in a telephony application as well as in a virtual world. Experiments were conducted which demonstrated that spatial audio increases the intelligibility of speech in a multiple-source environment and aids active speaker identi cation. Resource usage measurements show that these bene ts are, however, not without a cost. In conclusion, spatial audio was shown to be an improvement over the monaural audio model traditionally implemented in telephony.

AFRIKAANSE OPSOMMING: Telefonie het ansienlik ontwikkel oor die jare, maar die basiese ouditiewe model waarin die klank van alle verskillende bronne bymekaar gemeng word na een enkelouditoriese stroom het nie verander sedert die eerste telefoon gebou is nie. Enkelouditoriese klank is baie moeilik om te volg in 'n meervoudigebron situasie, soos byvoorbeeld in 'n konferensie oproep. Klank met oorsprong by 'n sekere punt in die ruimte sal 'n e ens anderse pad na elke oor volg. Selfs is ons nie aktief bewus hiervan nie, verwerk ons brein hierdie ruimtelike aanduidinge om ons te help om klanke in die ruimte te vind. Dit is hierdie ruimtelike inligting wat ons toelaat om ons aandag te vestig en te luister na 'n enkele spreker in 'n omgewing waar baie verskillende bronne terselfdertyd aktief mag wees, 'n verskynsel wat bekend staan as die \skemerkelkiepartytjiee ek". Dit is moontlik om hierdie ruimtelike leidrade na 'n klank te reproduseer met behulp van hoofverwandeoordragfunksies (HRTFs) en om daardeur 'n luisteraar gelokaliseerde klank te laat ervaar, selfs wanneer die klank deur middel van oorfone gespeel word. In hierdie tesis word ruimtelike klank ge mplementeer in 'n telefonieprogram, sowel as in 'n virtuelew^ereld. Eksperimente is uitgevoer wat getoon het dat ruimtelike klank die verstaanbaarheid van spraak in 'n meerderebronomgewing verhoog en help met aktiewe spreker identi kasie. Hulpbrongebruiks metings toon aan dat hierdie voordele egter nie sonder 'n koste kom nie. Ter afsluiting, dit is bewys dat ruimtelike klank 'n verbetering tewees gebring het oor die enkelouditorieseklankmodel wat tradisioneel in telefonie gebruik het.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/4376
This item appears in the following collections: