Leveraging shotgun proteomics for optimised interpretation of data-independent acquisition data: identification of diagnostic biomarkers for paediatric tuberculosis

Ehlers, Ashley (2020-12)

Thesis (MScMedSc)--Stellenbosch University, 2020.

Thesis

ENGLISH ABSTRACT: Althoughdiagnostic tests for paediatric tuberculosis (TB)are available,no specific test has been tailored to fit the diagnostic challenges children present as well as cater to limited resource settings. The high mortality rates recorded annually are associated with late diagnosis as well as insufficient household contact management (HCM). Further, urine has been identified as an attractive biofluid for urine protein biomarker discovery. Urine is non-invasive, easily attainablein large quantities and is associated with a low cost of collection. Improved data analysis approaches for protein and peptide identification and quantification has paved the way for the development of novel urine protein biomarkers for paediatric TB.Data-dependent acquisition (DDA) is a powerful approach in discovery of possible urine protein markers. By leveraging the shotgun proteome capabilities of protein and peptide identification using database search algorithms, an optimized data-independent acquisition (DIA) analysis method was developed. In this study, prior to data analysis, the quality of the DDA and DIA approach was evaluated by identifying batch effects and assessing the dissimilarity to allow abnormal runs to be identified and subsequently excluded. It is hypothesized that the quantity of specific host proteins in urine is different for children with TB compared to symptomatic control children who do not have TB. Using an optimised DIA data analysis method leveraging DDA data will allow a statistical identification of differentially abundant proteins in comparative proteomics. In this study,the MSstatsR-package for protein-level abundance testing was employed to generate comparisons between two groups, TB cases and controls,for a South African human-immunodeficiency virus (HIV) negative cohort.Three human proteins, leucine-rich alpha-2-glycoprotein (A2GL), aggrecan core protein (PGCA) and cartilage intermediate layer protein 2 (CILP2) were identified as significantly different. The findings of this study support the hypothesis that using an optimised DIA data analysis method leveraging DDA data will identify the differential proteins, potentiallyleading to validation for useas discovery phase urine protein markersin the clinical settings.

AFRIKAANSE OPSOMMING: Alhoewel diagnostiese toetse vir pediatriese tuberkulose (TB) beskikbaar is, is geen spesifieke toets aangepas om te pas by die diagnostiese uitdagings wat kinders bied nie asook om te voorsien na beperkte hulpbroninstellings. Die hoë sterftesyfers wat jaarliks aangeteken word, hou verband met laat diagnose sowel as onvoldoende huishoudelike kontakbestuur (HCM). Verder is urine geïdentifiseer as 'n aantreklike biovloeistof vir die ontdekking van proteïen bio-merkers. Urine kolleksie is nie-indringend nie,dis maklik bereikbaar in groot hoeveelhede en hou verband met lae versamelingskoste. Verbeterde benaderings vir data-analise vir die identifisering en kwantifisering van proteïene en peptiede, het die weg gebaan vir die ontwikkeling van nuwe urienproteïen-biomerkers vir TB inkinders.Data-afhanklike verkryging (DDA) is 'n kragtige benadering om moontlike urienproteïenmerkers te ontdek. Deur gebruik te maak van shotgun-proteoomse vermoëns omproteïen-en peptiedidentifikasie met behulp van databasis-soekalgoritmeste maak, is 'n geoptimaliseerde data-onafhanklike verkrygingsontledingsmetode(DIA)ontwikkel. In hierdie studie, voordatdata-analiseuitgevoer was, is die kwaliteit van die DDA-en DIA-benadering geëvalueer deur bondel-effekte te identifiseer en die verskille te beoordeel sodat abnormale monsters (uitskieters)geïdentifiseer en daarna uitgesluit kan word. Daar word veronderstel dat die hoeveelheid spesifieke proteïene in urine verskil vir kinders met TB in vergelyking met simptomatiese kontrolekinders wat nie TB het nie. Deur gebruik te maak van 'n geoptimaliseerde DIA-data-ontledingsmetode, wat gebruik maak van DDA-data, kan statistiese identifikasie van proteïene wat in verskillende mate in vergelykende proteomika bestaan, identifiseerword.In hierdie studie is die MSstatsR-pakket vir proteïenvlak-oorvloedtoetse gebruik om vergelykings tussen twee groepe, TB-gevalle en kontroles, te genereer vir 'n Suid-Afrikaanse mens-immuungebrekvirus (MIV) negatiewe groep. Drie menslike proteïene, leucienryke alfa-2-glikoproteïen (A2GL), aggrecan-kernproteïen (PGCA) en kraakbeen-tussenlaagproteïen 2(CILP2) is geïdentifiseer as beduidend verskillend. Die bevindinge van hierdie studie ondersteun die hipotese dat die gebruik van 'n geoptimaliseerde DIA-data-ontledingsmetode wat gebruik maak van DDA-data, die differensiële proteïene sal identifiseer,wat moontlik kan lei tot validering vir gebruik as ontdekkingsfase-urienproteïenmerkers in die kliniese omgewing.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/109276
This item appears in the following collections: