Network-based contextualisation of LC-MS/MS proteomics data

Geiger, Armin Guntram (2014-12)

Thesis (MSc)--Stellenbosch University, 2014.

Thesis

ENGLISH ABSTRACT: This thesis explores the use of networks as a means to visualise, interpret and mine MS-based proteomics data. A network-based approach was applied to a quantitative, cross-species LCMS/ MS dataset derived from two yeast species, namely Saccharomyces cere- visiae strain VIN13 and Saccharomyces paradoxus strain RO88. In order to identify and quantify proteins from the mass spectra, a workflow consisting of both custom-built and existing programs was assembled. Networks which place the identifed proteins in several biological contexts were then constructed. The contexts included sequence similarity to other proteins, ontological descriptions, proteins-protein interactions, metabolic pathways and cellular location. The contextual, network-based representations of the proteins proved effective for identifying trends and patterns in the data that may otherwise have been obscured. Moreover, by bringing the experimentally derived data together with multiple, extant biological resources, the networks represented the data in a manner that better represents the interconnected biological system from which the samples were derived. Both existing and new hypotheses based on proteins relating to the yeast cell wall and proteins of putative oenological potential were investigated. These proteins were investigated in light of their differential expression between the two yeast species. Examples of proteins that were investigated included cell wall proteins such as GGP1 and SCW4. Proteins with putative oenological potential included haze protection factor proteins such as HPF2. Furthermore, differences in capacity for maloethanolic fermentation between the two strains were also investigated in light of the protein data. The network-based representations also allowed new hypotheses to be formed around proteins that were identified in the dataset, but were of unknown function.

AFRIKAANSE OPSOMMING: Hierdie studie verken die gebruik van netwerke om proteonomiese data te visualiseer, te interpreteer en te ontgin. 'n Netwerkgebaseerde benadering is gevolg ter ontleding van 'n kwantitatiewe LC-MS/MS datastel wat afkomstig was van twee gis-spesies nl, Saccharomyces cerevisiae ras VIN1 en Saccharomyces paradoxus ras RO88. Die massaspektra is met bestaande en selfgeskrewe rekenaarprogramme verwerk om 'n werkvloei saam te stel ter identifisering en kwantifisering van die betrokke proteïene. Hierdie proteïene is dan aan bestaande biologiese databasisse gekoppel om die proteïene in biologiese konteks te plaas. Die gekontekstualiseerde is dan gebruik om biologiese netwerke van die data te bou. Die kontekste beskou onder meer lokalisering van selaktiwiteite, ontologiese beskrywings, ooreenkomste in aminosuur-volgordes en interaksies met bekende proteïene asook assosiasie en verbintenisse met metaboliese paaie. Hierdie kontekstuele, netwerk-gebaseerde voorstelling van die betrokke prote- ïene het effektief duidelike data-tendense en patrone opgelewer wat andersins nie opmerkbaar sou wees nie. Daarby het die kombinering van eksperimentele data en bestaande biologiese bronne 'n beter perspektief aan die data-analise verleen. Beide bestaande en nuwe hipoteses tov gis-selwandproteïene en prote ïene met moontlike wynkundige potensiaal is ondersoek in die lig van hul differensiële uitdrukking in die twee gis-spesies. Voorbeelde wat ondersoek is sluit in selwandproteïene soos GGP1 en SCW4 asook waasbeskermingsfaktorproteïen HPF2. Verskille tov kapasiteit mbt malo-etanoliese gisting is ook gevind. Die netwerk-gebaseerde voorstellings het ook aanleiding gegee tot die formulering van nuwe hipoteses mbt datastel-proteïene waarvan die funksies tans onbekend is.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/96116
This item appears in the following collections: