Wavelet-based speech enhancement : a statistical approach

Date
2004-12
Authors
Harmse, Wynand
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : University of Stellenbosch
Abstract
ENGLISH ABSTRACT: Speech enhancement is the process of removing background noise from speech signals. The equivalent process for images is known as image denoising. While the Fourier transform is widely used for speech enhancement, image denoising typically uses the wavelet transform. Research on wavelet-based speech enhancement has only recently emerged, yet it shows promising results compared to Fourier-based methods. This research is enhanced by the availability of new wavelet denoising algorithms based on the statistical modelling of wavelet coefficients, such as the hidden Markov tree. The aim of this research project is to investigate wavelet-based speech enhancement from a statistical perspective. Current Fourier-based speech enhancement and its evaluation process are described, and a framework is created for wavelet-based speech enhancement. Several wavelet denoising algorithms are investigated, and it is found that the algorithms based on the statistical properties of speech in the wavelet domain outperform the classical and more heuristic denoising techniques. The choice of wavelet influences the quality of the enhanced speech and the effect of this choice is therefore examined. The introduction of a noise floor parameter also improves the perceptual quality of the wavelet-based enhanced speech, by masking annoying residual artifacts. The performance of wavelet-based speech enhancement is similar to that of the more widely used Fourier methods at low noise levels, with a slight difference in the residual artifact. At high noise levels, however, the Fourier methods are superior.
AFRIKAANSE OPSOMMING: Spraaksuiwering is die proses waardeur agtergrondgeraas uit spraakseine verwyder word. Die ekwivalente proses vir beelde word beeldsuiwering genoem. Terwyl spraaksuiwering in die algemeen in die Fourier-domein gedoen word, gebruik beeldsuiwering tipies die golfietransform. Navorsing oor golfie-gebaseerde spraaksuiwering het eers onlangs verskyn, en dit toon reeds belowende resultate in vergelyking met Fourier-gebaseerde metodes. Hierdie navorsingsveld word aangehelp deur die beskikbaarheid van nuwe golfie-gebaseerde suiweringstegnieke wat die golfie-ko¨effisi¨ente statisties modelleer, soos die verskuilde Markovboom. Die doel van hierdie navorsingsprojek is om golfie-gebaseerde spraaksuiwering vanuit ‘n statistiese oogpunt te bestudeer. Huidige Fourier-gebaseerde spraaksuiweringsmetodes asook die evalueringsproses vir sulke algoritmes word bespreek, en ‘n raamwerk word geskep vir golfie-gebaseerde spraaksuiwering. Verskeie golfie-gebaseerde algoritmes word ondersoek, en daar word gevind dat die metodes wat die statistiese eienskappe van spraak in die golfie-gebied gebruik, beter vaar as die klassieke en meer heuristiese metodes. Die keuse van golfie be¨ınvloed die kwaliteit van die gesuiwerde spraak, en die effek van hierdie keuse word dus ondersoek. Die gebruik van ‘n ruisvloer parameter verhoog ook die kwaliteit van die golfie-gesuiwerde spraak, deur steurende residuele artifakte te verberg. Die golfie-metodes vaar omtrent dieselfde as die klassieke Fourier-metodes by lae ruisvlakke, met ’n klein verskil in residuele artifakte. By ho¨e ruisvlakke vaar die Fouriermetodes egter steeds beter.
Description
Thesis (MScIng)--University of Stellenbosch, 2004.
Keywords
Speech synthesis, Speech processing systems, Theses -- Electronic engineering, Dissertations -- Electronic engineering
Citation