Genome-Wide Associations Between Human Genotypes and Mycobacterium tuberculosis Clades Causing Disease

Pitts, Stephanie Julia (2019-03)

Thesis (PhD)--Stellenbosch University, 2019

Thesis

ENGLISH ABSTRACT: The World Health Organization (WHO) declared tuberculosis (TB) to be a global health emergency in 1993, and despite decades of extensive biomedical research, it remains a major cause of morbidity and mortality around the world. A disease primarily affecting the lungs, TB manifests following infection with a pathogenic member of the Mycobacterium tuberculosis (M. tb) Complex (MTBC) such as M.africanum and M. tb, although infection alone is not sufficient for disease. Each member of theMTBC consists of several strains (or clades), with variable virulence and disease-causing mechanisms. M.africanum is the main cause of TB in West African countries including Ghana, while M. tb isresponsible for TB cases in most other parts of the world, with stratification of clades by geographical location. TB is a multifactorial disease, influenced by environmental factors, bacterial virulence, and the genetic susceptibility of the host. While the genetic susceptibility of the host to the tuberculous disease has been extensively studied using genome-wide association studies and candidate gene studies, no method currently exists to perform an association analysis between the genetic architecture of the host and the susceptibility to the many clades of M. tb or M. africanum causing disease. Two geographically distinct cohorts were included in this study: a cohort of 947 participants self-identifying as belonging to the five-way admixed South African Coloured (SAC) population with paired infecting M. tb isolate information was used to establish the protocol for performing the association analysis, while a second cohort consisting of 3 311 participants recruited in Ghana was used to validate this method. The method developed includes quality control filters on both the host genotype data, and the infecting isolate database. Thereafter, haplotype phasing and genotype imputation of several reference panels was performed to increase the number of single nucleotide polymorphisms (SNPs) available for association testing. An assessment of imputation quality scores revealed the best imputation reference panel for the study cohort and a multinomial logistic regression (MLR) analysis was performed to assess potential associations between host genotypes and infecting bacterial clades of multiple classes. Here, we demonstrated that the African Genome Resource (used via the Sanger Imputation Server) produced the highest quality of imputed genotype data for the SAC cohort, while the 1000 Genomes Phase 3 reference panel was the best reference panel for the Ghanaian cohort. MLR was performed while controlling for covariates including age, sex, and ancestry proportions. After genotype imputation, 445 SAC - and 1 272 Ghanaian participants passed quality control and were tested for association to five- and six infecting superclades, respectively. Models of association revealed no SNPs reaching genome-wide significance for the SAC cohort, while 32 SNPs met the GWAS cut-off of 5 x 10-8 for the Ghanaian cohort. For the Ghanaian cohort, the risk allele of SNP rs551641937 (g.62385889G>A), located on chromosome 15, was determined to increase the risk of TB caused by the EAI/AFRI superclade by 276 times, when compared to the LAMCAM reference superclade. The emphasis of the dissertation was to perform an association analysis using host genotype and pathogen data and finding the best reference panel for imputing each of the two datasets was a secondary aim. This study demonstrates the first method successfully testing host-genotype associations with multiple clades of M. tb isolates causing disease.

AFRIKAANSE OPSOMMING: Tuberkulose (TB) is in 1993 as 'n globale gesondheidsprobleem deur die Wêreld Gesondheidsorganisasie verklaar. Ondanks dekades se omvattende biomediese navorsing, bly TB 'n hoofoorsaak van sterftes wêreldwyd. TB is 'n siekte wat hoofsaaklik die longe affekteer en manifesteer na infeksie met 'n patogeniese lid van die Mycobacterium tuberculosis-kompleks (MTBK), naamlik M. africanum en M. tb. Elke lid van die MTBK bestaan uit verskeie stamme (of klade) wat verskil in virulensie. M. africanum is die hoofoorsaak van TB in lande in Wes-Afrika insluitend Ghana, terwyl Mycobacterium tuberculosis (M. tb) verantwoordelik is vir TB gevalle in die meeste ander dele van die wêreld, met klades wat gegroepeer kan word volgens hulle geografiese ligging. TB is 'n komplekse siekte met verskeie faktore wat dit beïnvloed, insluitend omgewingsfaktore, bakteriële virulensie en die genetiese vatbaarheid van die gasheer. Verskeie studies, insluitend genoom-wye assosiasie studies (GWAS) en kandidaat studies, is al uitgevoer om die genetiese vatbaarheid van die gasheer vir TB te ondersoek. Tot dusver is daar geen metode om assosiasies te analiseer tussen die genetiese struktuur van die gasheer en die vatbaarheid tot enige van die verskeie klades van M. tb of M. africanum. Die studie het gebruik gemaak van twee kohorte in verskillende geografiese areas: 'n groep van 947 deelnemers wat hulself geïdentifiseer het as deel van die Suid Afrikaanse Kleurling (SAK) populasie, en 'n tweede groep met 3 311 deelnemers vanaf Ghana. Die SAK groep, afkomstig van vyf voorvaderlike populasies, met ooreenstemmende M. tb isolaat informasie was gebruik om die protokol vir gasheer genotipe-tot-infeksie klade te ontwikkel. 'n Tweede groep vanaf Ghana was ingesluit om die metode te valideer. Die metode sluit kwaliteitskontrole filters in vir beide die gasheer genotipe data, asook vir die infeksie isolaat databasis. Die volgende stap was haplotipe fasering en genotipe imputasie. Dit was uitgevoer met verskeie verwysings panele om die hoeveelheid enkel-nukleotied polimorfismes (ENP) beskikbaar vir assosiasie toetse te vermeerder. Die kwaliteit van imputasie was bepaal deur die beste verwysing paneel elke kohort te kieswaarna multinomiale logistieke regressie (MLR) analiese gebruik was om potensiale assosiasies tussen die gasheer genotipe en infekterende bakteriële klades van veelvuldige klasse te bepaal. Hierdie studie demonstreer dat die Afrika Genoom Hulpbron (gebruik deur die Sanger Imputasie Bediener) die beste kwaliteit imputasies gegee het vir genotipe data vir die SAK populasie, terwyl die 1000 Genome Fase 3 verwysings paneel die beste was vir die Ghana kohort. MLR analise het ouderdom, geslag en genetiese afkoms in ag geneem. Na genotipe imputasie, het 445 SAK en 1 272 Ghana deelnemers die kwaliteits kontrole stappe geslaag en is afsonderlik getoets vir moontlike assosiasies met vyf of ses infekterende superklades, onderskeidelik. Die modelle van assosiasie het nie enige ENP in die SAK populasie uitgelig wat genoom-wyd statisties betekenisvol was nie, maar daar was egter 32 ENP’s wat 'n waarskynlikheids waarde kleiner as 5 x 10-8 gehad het vir die Ghana kohort. Daar is gevind dat een van die ENKs, rs551641937 (g.62385889G>A) geleë op kromosoom 15, die risiko van TB in verband met die EAI/AFRI super-klade 276 keer verhoog in vergelyking met die LAMCAM super-klade. Die klem van die verhandeling was om 'n assosiasie-analise uit te voer met behulp van gasheergenotipe en patogeen data en die vind van die beste verwysingspaneel om elkeen van die twee datastelle toe te pas, was 'n sekondêre doelwit. Hierdie studie demonstreer die eerste metode wat suksesvol gebruik was om te toets vir assosiasies tussen gasheer genotipe en veelvuldige klades wat TB veroorsaak.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/105592
This item appears in the following collections: