A phylogenomic investigation into the evolution and biological characteristics of the Beijing lineage family of principle genetic group 1 members of mycobacterium tuberculosis

Siame, Kabengele Keith (2016-12)

Thesis (MSc)--Stellenbosch University, 2016

Thesis

ENGLISH ABSTRACT : Tuberculosis has continued to be a global health concern and warrants an increased understanding of its causative agent Mycobacterium tuberculosis (M. tuberculosis) in terms of its evolution, virulence and other biological traits. M. tuberculosis has been sub-divided into a number of lineage families and sub-lineages based on a number of molecular markers. The Beijing lineage of M. tuberculosis has been responsible for a large proportion of tuberculosis cases in Cape Town South Africa. It’s evolution and biological characteristics in Cape Town have been investigated using a variety of molecular markers resulting in the identification of 7 sub-lineages. These however have not been investigated as a group using whole genome sequencing. Furthermore, two isolates from sub-lineage 7 reflecting either on-going transmission (clustered) or the absence of transmission (unique) were shown to have a hyper-virulent or hypo-virulent phenotypes in a murine infection model, respectively. Additionally, the hyper-virulent strain elicited an anti-inflammatory TH2 immune response in the murine model whilst the hypo-virulent strain had a pro-inflammatory TH1 immune response. The genetic mechanism(s) underlying these contrasting phenotypes remain to be elucidated. This study aimed to further elucidate the evolutionary history of the 7 sub-lineages of the Beijing lineage of M. tuberculosis in a Cape Town suburb of South Africa using whole genome sequencing analysis. Whole genome sequencing of the 7 sub-lineages of the Beijing lineage was performed on an Illumina platform generating 105 bp paired-end reads. In addition further sequencing of 2 strains of sub-lineage 7 having contrasting phenotypes was done on a PacBio platform generating long single end raw reads. Three mapping algorithms were used to align the Illumina paired-end reads to the M. tuberculosis reference H37Rv. An overlap of SNPs called by each mapping algorithm determined our set of high confidence SNPs which were subsequently used for phylogenetic and comparative SNP analysis. De novo assembly was performed using MIRA and CELERA software to generate a hybrid assembly using Illumina and PacBio raw reads. The super-contig was searched to identify the sequence adjacent to the IS6110 location. NCBI BLAST was used to determine the location of the IS6110 element with respect to M. tuberculosis H37Rv or M. bovis AF2122/97 complete genome reference genomes. Comparative studies of phylogenetic trees based on genome-wide SNPs showed that the genome-wide SNP tree generated in this study differed from the one based on insertion point markers and selected SNPs in the mutT and ogt genes by having the evolutionary positions of sub-lineages 5 and 6 interchanged. The latter markers were however more appropriate for molecular epidemiology studies. The genome-wide phylogenetic trees were also superior to trees based on 43 SNPs in the replication, repair and recombination genes in that the latter exhibited branch collapse in this study. The comparative SNP analysis among the 7 sub-lineages of Beijing showed the evolution of amino acid changes occurred mostly in the genes of cell wall, cell processes, intermediary metabolism and respiration. Significant overrepresentation of biological processes associated with these changes was however only observed in sub-lineage 1 and observed common ancestor of sub-lineages 1, 2 and 3. Intergenic SNPs unique to each sub-lineage were however identified in close proximity to previously described transcriptional start sites and thus warrant further investigations on their associated transcriptional promoter activity. The more focused analysis of 2 closely-related members of Beijing sub-lineage 7 having contrasting virulence phenotypes had unique predicted deleterious non-synonymous SNPs which were associated with their whole proteome expression. This included a protein involved in lipid metabolism only expressed in the hyper-virulent strain with the hypo-virulent having a deleterious SNP for the protein and no protein expression. De novo assembly of the two strains also revealed structural variation in the form of a number of unique IS6110 transposon elements. Of these, 1 IS6110 element unique to the hypo-virulent strain had an associated large sequence inversion event which has been reported previously by others.

AFRIKAANSE OPSOMMING : Tuberkulose is steeds ‘n wêreldwye gesondheidsprobleem en vereis ‘n beter begrip van die organisme wat dit veroorsaak, Mycobacterium tuberculosis (M. tuberculosis), veral met betrekking tot die evolusie, virulensie en ander biologiese eienskappe. M. tuberculosis stamme kan op grond van ‘n aantal molukulêre merkers ingedeel word in ‘n aantal linie families en sublinies. Die Beijing linie van M. tuberculosis is verantwoordelik vir ‘n groot hoeveelheid van die tuberkulose gevalle in Kaapstad, Suid-Afrika. Die evolusie en biologiese eienskappe van die Beijing linie in Kaapstad is ondersoek deur middel van ‘n verskeidenheid molukulêre merkers, wat gelei het tot die identifikasie van 7 sublinies. Hierdie sublinies is egter nog nie tevore as ‘n groep met behulp van heelgenoom volgordebepaling ondersoek nie. Verder verskil isolate binne sublinies, soos getoon deur twee isolate van sublinie 7 wat onderskeidelik voorturend oordra word (gegroepeer) of nie oordra word nie (uniek). Daar is ook in ‘n muismodel getoon dat hierdie isolate onderskeidelik hipervirulent en hipovirulent is. Verder het die hipervirulente stam ‘n anti-inflammatoriese TH2 immuunrespons in die muismodel ontlok, waar die hipovirulente stam ‘n pro-inflammatoriese TH1 immuunrespons getoon het. Die genetiese meganisme(s) verantwoordelik vir hierdie kontrasterende fenotipes moet nog verklaar word. Hierdie studie het verder ten doel gehad om die evolusionêre geskiedenis van die 7 sublinies van die Beijing linie van M. tuberculosis in ‘n voorstad van Kaapstad, Suid-Afrika uit te lig deur middel van heelgenoom volgordebepaling. Heelgenoom volgordebepaling van die 7 sublinies van die Beijing linie is op ‘n Illumina platform gedoen wat 105 basispaar gepaarde einde stringe genereer. Verdere volgordebepaling van 2 stamme van sublinie 7 met kontrasterende fenotipes is op ‘n PacBio platform gedoen, wat lang, enkel-stringe genereer. Drie karteringsalgoritmes is gebruik om die Illumina gepaarde einde stringe te pas op die M. tuberculosis H37Rv verwysingstam. ‘n Oorvleueling van SNPs wat deur elke karteringsalgoritme aangewys is, het ‘n stel van hoë sekerheid SNPs bepaal, wat vervolgens gebruik is vir filogenetiese- en vergelykende SNP analises. De novo samestelling is met MIRA en CELERA sagteware gedoen om ‘n hibriedsamestelling te genereer van Illumina en PacBio onbewerkte stringe. Ten einde die relatiewe posisies van IS6110 ten opsigte van M. tuberculosis H37Rv of M. Bovis AF2122/97 te bepaal met behulp van NCBI BLAST, is hierdie supersamestelling is areas naasliggend aan IS6110, waarvan die volgorde bekend is, geïdentifiseer Vergelykende studies van filogenetiese bome wat gebaseer is op genoomwye SNPs het getoon dat die genoomwye SNP boom wat in hierdie studie gegenereer is, verskil van die een wat gebaseer is op invoegingspunt merkers en SNPs in die mutT en ogt gene deurdat die evolusionêre posisies van sublinies 5 en 6 omgeruil is. Laasgenoemde merkers was egter meer toepaslik vir molekulêre epidemiologiese studies. Die genoomwye filogenetiese bome was ook beter as die die bome wat op 43 SNPs in die replikasie, herstel en rekombinasie gene gebaseer is, deurdat laasgenoemde vertakkingsineenstorting veroorsaak het in hierdie studie. Die vergelykende SNP analise tussen die 7 sublinies van Beijing wys dat die evolusie van aminosuurveranderinge meestal voorkom in die gene van selwand, selprosesse, intermediêre metabolisme en respirasie. Beduidende oorverteenwoordiging van biologiese prosesse wat geassosiëer word met hierdie veranderinge is egter waargeneem in sublinie 1 en die waargenome gemene voorsaat van sublinies 1, 2 en 3. Intergeniese SNPs wat uniek is tot elke sublinie is egter waargeneem in posisies wat naby geleë is aan voorheen beskryfde transkripsiebeginpunte, en oorloof verdere navorsing oor die geassosiëerde transkripsionele promoter aktiwiteit. Die meer gefokusde analise van 2 nabyverwante lede van Beijing sublinie 7 wat kontrasterende virulensie fenotipes het, het verskillende voorspelde nadelige nie-sinonieme SNPs getoon, wat verband hou met hul heel-proteoom uitdrukking. Dit sluit ‘n lipiedprotein in wat net in die hipervirulente stam uitgedruk word, terwyl die hipovirulente stam ‘n nadelige SNP vir die proteïen gehad het, met geen proteïenuitdrukking nie. De novo samestelling van die twee stamme het ook strukturele variasie in die vorm van ‘n aantal unieke IS6110 transposon elemente onthul. Een van hierdie IS6110 elemente wat uniek was tot die hipovirulente stam, het ‘n geassosiëerde groot volgorde inversie gehad wat voorheen deur ander outeurs beskryf is.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/100029
This item appears in the following collections: