Parallel likelihood calculations for phylogenetic trees

Hayward, Peter

Parallel likelihood calculations for phylogenetic trees

dc.contributor.advisor	Scheffler, K.	en_ZA
dc.contributor.author	Hayward, Peter	en_ZA
dc.contributor.other	Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences.	en_ZA
dc.date.accessioned	2011-11-16T15:17:06Z	en_ZA
dc.date.available	2011-12-05T13:11:20Z
dc.date.issued	2011-12	en_ZA
dc.description	Thesis (MSc)--Stellenbosch University, 2011.	en_ZA
dc.description.abstract	ENGLISH ABSTRACT: Phylogenetic analysis is the study of evolutionary relationships among organisms. To this end, phylogenetic trees, or evolutionary trees, are used to depict the evolutionary relationships between organisms as reconstructed from DNA sequence data. The likelihood of a given tree is commonly calculated for many purposes including inferring phylogenies, sampling from the space of likely trees and inferring other parameters governing the evolutionary process. This is done using Felsenstein’s algorithm, a widely implemented dynamic programming approach that reduces the computational complexity from exponential to linear in the number of taxa. However, with the advent of efficient modern sequencing techniques the size of data sets are rapidly increasing beyond current computational capability. Parallel computing has been used successfully to address many similar problems and is currently receiving attention in the realm of phylogenetic analysis. Work has been done using data decomposition, where the likelihood calculation is parallelised over DNA sequence sites. We propose an alternative way of parallelising the likelihood calculation, which we call segmentation, where the tree is broken down into subtrees and the likelihood of each subtree is calculated concurrently over multiple processes. We introduce our proposed system, which aims to drastically increase the size of trees that can be practically used in phylogenetic analysis. Then, we evaluate the system on large phylogenies which are constructed from both real and synthetic data, to show that a larger decrease of run times are obtained when the system is used.	en_ZA
dc.description.abstract	AFRIKAANSE OPSOMMING:Filogenetiese analise is die studie van evolusionêre verwantskappe tussen organismes. Filogenetiese of evolusionêre bome word aangewend om die evolusionêre verwantskappe, soos herwin vanuit DNS-kettings data, tussen organismes uit te beeld. Die aanneemlikheid van ’n gegewe filogenie word oor die algemeen bereken en aangewend vir menigte doeleindes, insluitende die afleiding van filogenetiese bome, om te monster vanuit ’n versameling van sulke moontlike bome en vir die afleiding van ander belangrike parameters in die evolusionêre proses. Dit word vermag met behulp van Felsenstein se algoritme, ’n alombekende benaderingwyse wat gebruik maak van dinamiese programmering om die berekeningskompleksiteit van eksponensieel na lineêr in die aantal taxa, te herlei. Desnieteenstaande, het die koms van moderne, doeltreffender orderingsmetodes groter datastelle tot gevolg wat vinnig besig is om bestaande berekeningsvermoë te oorskry. Parallelle berekeningsmetodes is reeds suksesvol toegepas om vele soortgelyke probleme op te los, met groot belangstelling tans in die sfeer van filogenetiese analise. Werk is al gedoen wat gebruik maak van data dekomposisie, waar die aanneemlikheidsberekening oor die DNS basisse geparallelliseer word. Ons stel ’n alternatiewe metode voor, wat ons segmentasie noem, om die aanneemlikheidsberekening te parallelliseer, deur die filogenetiese boom op te breek in sub-bome, en die aanneemlikheid van elke sub-boom gelyklopend te bereken oor verskeie verwerkingseenhede. Ons stel ’n stelsel voor wat dit ten doel het om ’n drastiese toename in die grootte van die bome wat gebruik kan word in filogenetiese analise, teweeg te bring. Dan, word ons voorgestelde stelsel op groot filogenetiese bome, wat vanaf werklike en sintetiese data gekonstrueer is, evalueer. Dit toon aan dat ’n groter afname in looptyd verkry word wanneer die stelsel in gebruik is.	af_ZA
dc.format.extent	88 p. : ill.
dc.identifier.uri	http://hdl.handle.net/10019.1/17919	en_ZA
dc.language.iso	en_ZA	en_ZA
dc.publisher	Stellenbosch : Stellenbosch University	en_ZA
dc.rights.holder	Stellenbosch University	en_ZA
dc.subject	Phylogenetics	en_ZA
dc.subject	Bioinformatics	en_ZA
dc.subject	High-performance computing	en_ZA
dc.subject	Parallel programming	en_ZA
dc.subject	Dissertations -- Computer science	en_ZA
dc.subject	Theses -- Computer science	en_ZA
dc.subject	Dissertations -- Mathematics	en_ZA
dc.subject	Theses -- Mathematics	en_ZA
dc.title	Parallel likelihood calculations for phylogenetic trees	en_ZA
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: hayward_parallel_2011.pdf
Size:: 3.88 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.98 KB
Format:: Plain Text
Description:

Download

Collections

Masters Degrees (Mathematical Sciences)