Quality control for data-dependent and data-independent mass-spectrometry-based proteomics

Marina, Kriek

Quality control for data-dependent and data-independent mass-spectrometry-based proteomics

Files

kriek_quality_2020.pdf(4.47 MB)

Date

2020-12

Authors

Marina, Kriek

Publisher

Stellenbosch : Stellenbosch University

Abstract

ENGLISH ABSTRACT: Discovery proteomics is advancing at a rapid rate, and quality control of the technique must adapt accordingly. In 2012, a console application, QuaMeter, was created to produce quality control metrics for data-dependent proteomics based on metrics first designed by the USA National Institute for Standards and Technology (NIST). In 2014, the tool gained an identification-independent mode, which can generate 44 quality metrics still applicable only to data-dependent acquisition. However, the development of new data-independent acquisition methods in recent years introduces the need for a data-independent acquisition version of QuaMeter. The QuaMeter metrics must also still be analysed in a statistical framework such as R/Python to gain full value of the multivariate nature of the metrics. Biologists who are inexperienced at programming/ using a console might therefore find the use of such software limiting and there is a desire for a tool with a user interface with which to analyse the metrics. Here, I have created a console software for the analysis of data-independent acquisition results. The tool provides a platform for in-depth analysis of data quality. The tool is the first of its sort to allow the user to divide the retention time into segments and return quality metrics for each segment separately. This allows the researcher to gain extra insight into the chromatography steps, and as I illustrate here, the tool illuminates problems that would not have been visible if only one metric was provided for the entire file. In addition, the m/z axis is split into the data’s underlying isolation window structure and metrics calculated for each window separately to equip a researcher with additional information for method development. A set of metrics is also added which produce one value for the entire file for easy outlier detection among files. This project also involves the creation of a desktop application with user interface for running either of the two console applications. This tool can also perform some of the key downstream analysis regularly performed in quality control. Outlier detection is enabled via PCA, classification of longitudinal data as good or bad quality is performed with random forest analysis and individual metrics can also be visualized against their distributions. In addition, many quality control principles are explained and demonstrated in the context of the quality control metrics, such as experimental design, identifying sources of variability in an experiment and conventional quality control techniques such as outlier detection and classification of data quality are demonstrated.
AFRIKAANSE OPSOMMING: Proteïen-massaspektrometrie maak die afgelope dekade baie vinnig vordering en die gehaltebeheer van die tegniek moet derhalwe dienooreenkomstig aangepas word. In 2012 is ’n konsole-toepassing, QuaMeter, vir die voortbrenging van gehaltemetings vir data-afhanklike proteoomanalise geskep. Hierdie weergawe van die toepassing is op ’n toepassing deur die Amerikaanse National Institute for Standards and Technology(NIST) gebaseer. In 2014 is ’n identifikasie-onafhanklike weergawe van die sagteware bygevoeg, wat 44 gehaltemetings rapporteer, maar steeds net vir data-afhanklike verkrygingstegnieke. Meer onlangs is daar egter nuwe data-onafhanklike verkrygingsmetodes ontwerp wat redelike steun in die gemeenskap geniet. Daar het dus ’n behoefte aan ’n data-onafhanklike weergawe van QuaMeter ontstaan. Die resultate van QuaMeter moet egter steeds stroomaf deur ’n statistiese raamwerk soos R/Python geanaliseer word om die meerveranderlike aard van QuaMeter ten volle te benut. Bioloë wat onervare in programmering of die gebruik van ’n konsole is, mag dit dalk as ’n onoorkomelike struikelblok beskou. Ek het derhalwe ’n konsole-sagteware, SwaMe, vir die analise van data-onafhanklike verkrygingsresultate gebou. SwaMe verskaf ’n platform vir ’n meer diepgaande analise van die datagehalte. Dié hulpmiddel is die eerste in sy soort wat die gebruiker toelaat om die retensietyd in segmente te verdeel en gehaltemetings vir elke segment afsonderlik te bereken. Sodoende kan die navorser insig verkry in die chromatografie, en soos ek hier aantoon, word instrumentele probleme uitgewys wat nie sigbaar sou gewees het indien daar slegs een waarde per monster gerapporteer was nie. Die m/z-as word in die data se onderliggende isolasievensterstruktuur onderverdeel en gemiddelde metings word vir elke venster afsonderlik verskaf, wat metode-ontwikkeling verder vergemaklik. ’n Stel metings wat slegs een waarde per monster bereken, word ook verskaf, wat veral in uitskieteropsporing nuttig is. Die projek sluit ook die ontwerp in van ’n grafiesekoppelvlak-toepassing, Assurance, wat ’n platform bied om die twee konsole-toepassings aan te wend. Dié werktuig kan ook help met die uitvoering van sekere van die belangrikste stroomaf statistiese analise. Dit word gereeld in gehaltebeheer uitgevoer en sluit in uitskieter-identifisering van hoofkomponentanalise en die klassifisering van longitudinale data as goed of sleg deur masjienleer; die visualisering van individuele metings met die dataverspreiding kan ook plaasvind. Talle gehaltebeheerbeginsels, soos eksperimentele ontwerp en die identifisering van bronne van veranderlikheid, word ook verduidelik en in die konteks van die gehaltemetings gedemonstreer. Daarbenewens word tradisionele gehaltebeheertegnieke soos dataklassifisering ook gedemonstreer.

Description

Thesis (PhD)--Stellenbosch University, 2020.

Keywords

Bioinformatics, UCTD, Proteomics, Mass spectrometry

URI

http://hdl.handle.net/10019.1/109383

Collections

Doctoral Degrees (Molecular Biology and Human Genetics)

Full item page