Data-driven methods for exploratory analysis in chemometrics and scientific experimentation

Emerton, Guy (2014-04)

Thesis (MSc)--Stellenbosch University, 2014.

Thesis

ENGLISH ABSTRACT: Background New methods to facilitate exploratory analysis in scientific data are in high demand. There is an abundance of available data used only for confirmatory analysis from which new hypotheses can be drawn. To this end, two new exploratory techniques are developed: one for chemometrics and another for visualisation of fundamental scientific experiments. The former transforms large-scale multiple raw HPLC/UV-vis data into a conserved set of putative features - something not often attempted outside of Mass-Spectrometry. The latter method ('StatNet'), applies network techniques to the results of designed experiments to gain new perspective on variable relations. Results The resultant data format from un-targeted chemometric processing was amenable to both chemical and statistical analysis. It proved to have integrity when machine-learning techniques were applied to infer attributes of the experimental set-up. The visualisation techniques were equally successful in generating hypotheses, and were easily extendible to three different types of experimental results. Conclusion The overall aim was to create useful tools for hypothesis generation in a variety of data. This has been largely reached through a combination of novel and existing techniques. It is hoped that the methods here presented are further applied and developed.

AFRIKAANSE OPSOMMING: Agtergrond Nuwe metodes om ondersoekende ontleding in wetenskaplike data te fasiliteer is in groot aanvraag. Daar is 'n oorvloed van beskikbaar data wat slegs gebruik word vir bevestigende ontleding waaruit nuwe hipoteses opgestel kan word. Vir hierdie doel, word twee nuwe ondersoekende tegnieke ontwikkel: een vir chemometrie en 'n ander vir die visualisering van fundamentele wetenskaplike eksperimente. Die eersgenoemde transformeer grootskaalse veelvoudige rou HPLC / UV-vis data in 'n bewaarde stel putatiewe funksies - iets wat nie gereeld buite Massaspektrometrie aangepak word nie. Die laasgenoemde metode ('StatNet') pas netwerktegnieke tot die resultate van ontwerpte eksperimente toe om sodoende ân nuwe perspektief op veranderlike verhoudings te verkry. Resultate Die gevolglike data formaat van die ongeteikende chemometriese verwerking was in 'n formaat wat vatbaar is vir beide chemiese en statistiese analise. Daar is bewys dat dit integriteit gehad het wanneer masjienleertegnieke toegepas is om eienskappe van die eksperimentele opstelling af te lei. Die visualiseringtegnieke was ewe suksesvol in die generering van hipoteses, en ook maklik uitbreibaar na drie verskillende tipes eksperimentele resultate. Samevatting Die hoofdoel was om nuttige middele vir hipotese generasie in 'n verskeidenheid van data te skep. Dit is grootliks bereik deur 'n kombinasie van oorspronklike en bestaande tegnieke. Hopelik sal die metodes wat hier aangebied is verder toegepas en ontwikkel word.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/86366
This item appears in the following collections: