Lung health diagnosis through cough sound analysis

Botha, Gert Hendrik Renier (2017-03)

Thesis (MEng)--Stellenbosch University, 2017.

Thesis

ENGLISH ABSTRACT: This study investigates a simple and easily applied tool for TB screening based on the analysis of cough audio and objective clinical measurements. Tuberculosis is one of the most lethal diseases worldwide. There are various diagnosis methods for TB. However, in lower income areas, clinics lack funds to afford expensive equipment and employ the trained experts needed to interpret results. A database of cough audio recordings and clinical measurements was collected for this study. An automatic annotation system was developed using hidden Markov models (HMMs). The frame-accuracy of the annotation system is 87:16%. For audio based classification we considered logistic regression and Gaussian mixture models (GMMs). We found that filterbank energy features outperformed MFCC features when used for audio classification, which could indicate that cough audio contains information relevant to TB diagnosis that is not perceivable by the human auditory system. Feature selection was used to investigate the importance of different frequency bands for classification and, it was found that the optimal results were achieved when combining features from the human vowel range (below 1000Hz) with features from high frequency ranges. As the main metric of evaluation, we used the area under the receiver operator characteristic curve (AUC). This metric was chosen because it is not affected by class imbalance in the dataset. Our best reported AUC was 94:94%, with a standard deviation of 4:62%, which was obtained using a set of just 5 filterbank energies. We also showed that audio based classification obtains a higher AUC than classification on objective clinical measurements (meta data). Finally, we found that combining the audio and meta data classifier results using classifier fusion improved how well the model generalizes. By combining the best audio classifier with the best meta data classifier, we obtained a sensitivity, specificity, accuracy, AUC and kappa of 82:35%; 80:95%; 81:58%; 94:34% and 0:6867 respectively.

AFRIKAANSE OPSOMMING: Hierdie studie ondersoek 'n eenvoudige en makliktoegepaste instrument vir die skandering van tuberkulose (TB), gebaseer op die analise van hoes-audio en objektiewe kliniese metings. Tuberkulose is wreldwyd een van die dodelikste siektes. Daar is verskeie metodes vir die diagnosering van TB. In laer-inkomste areas is daar egter gebrekkige befondsing vir duur toerusting en die aanstelling van opgeleide kundiges om toetsuitslae te interpreteer. 'n Databasis van hoes-audio opnames en kliniese metings is vir hierdie studie versamel. 'n Outomatiese annotasiestelsel is ontwikkel deur versteekte Markov modelle (HMMs) te gebruik. Die beramingsakkuraatheid vir die annotasiestelsel is 87.16%. Vir audio-gebaseerde klassifikasie het ons logistiese regressie en Gaussiese vermengingsmodelle (GMMs) gebruik. Ons het gevind dat filterbank energie kenmerke meer doeltreffend as MFCC kenmerke is wanneer dit vir audioklassifikasie gebruik is, wat kan aandui dat hoes-audio inligting relevant tot TB diagnose bevat wat nie deur die menslike gehoorstelsel geregistreer kan word nie. Funksie seleksie is gebruik om die belangrikheid van verskillende frekwensiebande vir klassifikasie te ondersoek en daar is gevind dat die optimale uitslae bereik is wanneer funksies van die menslike vokaalreeks (onder 1000Hz) met funksies van ho frekwensiereekse gekombineer is. Ons het die area onder die ontvangers operator eienskap kurwe (ROC AUC) as die hoofmaktriks van evaluering gebruik. Hierdie matriks is gekies omdat dit nie deur klaswanbalans in die datastel geaffekteer word nie. Ons mees doeltreffende AUC was 94.94%, met 'n standaardafwyking van 4.62%, wat verkry is deur 'n stel van slegs 5 filterbankenergie te gebruik. Ons het ook gewys dat audio-gebaseerde klassifikasie 'n hor AUC bereik as klassifikasie op objektiewe kliniese metings (metadata). Laastens het ons gevind dat die kombinering van die audio en metadata klassifiseringsuitslae deur klassifiseringsfusie die veralgemening van die model verbeter het. Deur die beste audio klassifiseerder met die beste metadata klassifiseerder te kombineer het ons n sensitiwiteit, spesifisiteit, akkuraatheid, AUC en kappa van 82.35%, 80.95%, 81.58%, 94.34% en 0.6867 onderskeidelik verkry.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/101258
This item appears in the following collections: