A metagenomic approach using next-generation sequencing for viral profiling of a vineyard and genetic characterization of grapevine virus E

Coetzee, Beatrix (2010-12)

Thesis (MSc (Genetics))--University of Stellenbosch, 2010.

Includes bibliography.

Title page: Dept. of Genetics, Faculty of Science

Thesis

ENGLISH ABSTRACT: Next-generation sequencing technologies are increasingly used in metagenomic studies, largely due to the high sequence data throughput capacity and unbiased approach in determining the genetic composition of an unknown environmental sample. This study investigated the applicability of the Illumina next-generation sequencing platform for metagenomic sequencing of grapevine viruses to provide the first complete viral profile, or virome, of a diseased vineyard. Leaf material was harvested from 44 randomly selected vines in a leafroll-diseased vineyard in South Africa. Sample material was pooled and double-stranded RNA extracted. The dsRNA was sequenced as a paired-end sequencing run using the Illumina sequencing-by-synthesis technique, and more than 19 million sequence reads, equivalent to approximately 837 megabases of metagenomic sequence data, were obtained. Of these data, approximately 400 megabases could be assembled into 449 scaffolds, using the de novo assembler Velvet. These scaffolds were subjected to BLAST searches against the NCBI databases and top hit scores were used for virus identification. Based on the BLAST results, suitable sequences were selected from the NCBI database and used as reference sequence in MAQ mapping assemblies. The bioinformatic analyses allowed for the determination of the virus species present, the most prominent variants, and the relative abundance of each. Four known grapevine viral pathogens were identified. Grapevine leafroll-associated virus 3, representing 59% of the analyzed short read sequence data, was identified as the most prominent virus species. Three variants of this virus were detected: GP18 was the most abundant, followed by a minor Cl766/NY1 variant and a potential novel grapevine leafroll-associated ampelovirus. A single Grapevine rupestris stem pitting ]associated virus variant, similar to SG1, and a Grapevine virus A variant, a member of molecular group III, were identified. This study is also the first to report the presence of Grapevine virus E (GVE) in South African vineyards. Grapevine virus E was further genetically characterized and the genome sequence of GVE isolate SA94 determined. The GVE SA94 genome sequence, 7568 nucleotides in length, is the first complete genome sequence for the virus species. The genome organization of GVE SA94 is typical of vitiviruses, but in contrast to other RNA viruses, the AlkB domain is located within the helicase domain in open reading frame 1 (ORF 1). Grapevine virus E SA94 shares nearly 100% nucleotide identity with the Japanese TvP15 isolate and GVE 3404, a de novo scaffold generated from the metagenomic sequence data. Bioinformatic analysis of metagenomic sequence data further revealed the presence of three fungus-infecting viral families, Chrysoviridae, Totiviridae and the unclassified dsRNA virus, Fusarium graminearum dsRNA mycovirus 4. A virus from the family Chrysoviridae, similar to Penicillium chrysogenum virus, was the second most abundant virus detected. We demonstrated the successful application of a short read sequencing technology, such as the Illumina platform, for viral profiling of an infected vineyard. To our knowledge this is the first application of the Illumina technology for this purpose.

AFRIKAANSE OPSOMMING: Volgende-generasie tegnologie om basis volgordes van nukleiensure te bepaal, word al meer gebruik in metagenomiese studies. Dit is veral weens die hoe data-omset kapasiteit en onbevooroordeelde aanslag in die bepaling van die genetiese samestelling van onbekende omgewingsmonsters. Hierdie studie het die aanwending van die Illumina volgende-generasie volgorde-bepalingsplatform in 'n metagenomiese studie van wingerdvirusse, ondersoek. Dit het ten doel gehad om die eerste volledige virus profiel, of viroom, van 'n geinfekteerde wingerd saam te stel. Blaarmateriaal is verkry vanaf 44 lukraak-gekose wingerdstokke in 'n rolblad-geinfekteerde wingerd in Suid-Afrika. Monster materiaal is saamgevoeg en dubbelstring-RNS geekstraheer. Die dubbelstring-RNS is onderwerp aan gepaarde-ent volgorde-bepaling deur gebruik te maak van die Illumina volgorde-bepaling-deur-sintese tegniek. Meer as 19 miljoen volgorde reekse, ekwivalent aan ongeveer 837 megabasisse volgorde data, is verkry. Van hierdie data kon ongeveer 400 megabasisse saamgevoeg word in 449 konstrukte ("scaffolds"), deur gebruik te maak van die de novo samesteller Velvet. Hierdie konstrukte is onderwerp aan BLAST soektogte teen die NCBI databasisse en die hoogste trefslag-telling is gebruik vir virus identifikasie. Op grond van die "BLAST" resultate is geskikte volgordes geselekteer vanaf die NCBI databasis en gebruik as verwysingvolgordes in MAQ kartering-analises. Met die bioinfomatika analises kon die virus spesies teenwoordig, asook die mees prominente variante en relatiewe voorkoms van elk, bepaal word. Vier bekende virus wingerdpatogene is geidentifiseer. Grapevine leafroll-associated virus 3, verteenwoordig deur 59% van die geanaliseerde kort-reeks volgorde data, is identifiseer as die mees prominente virus spesie. Drie variante van die virus is in die wingerdmonster opgespoor: GP18 kom die mees algemeen voor, gevolg deur 'n CL-766/NY1 variant en 'n potensiele nuwe wingerd rolblad-geassosieerde ampelovirus. 'n Enkele Grapevine rupestris stem pitting-associated virus variant, soortgelyk aan SG1, en 'n Grapevine virus A variant, 'n lid van molekulere groep III, is geidentifiseer. Hierdie studie is ook die eerste om die teenwoordigheid van Grapevine virus E (GVE) in Suid-Afrikaanse wingerde te rapporteer. Grapevine virus E is verder geneties gekarakteriseer en die genoomvolgorde van GVE isolaat SA94 is bepaal. Die GVE SA94 genoomvolgorde, 7568 nukleotiede lank, is die eerste volledige genoomvolgorde vir hierdie virus spesie. Die genoomorganisasie is tipies van vitivirusse, maar in kontras met ander RNA virusse is die AlkB domein binne-in die helikase domein van oopleesraam 1 (ORF 1) geleë. Grapevine virus E SA94 deel byna 100% nukleotied identiteit met die Japannese TvP15 isolaat en GVE 3404, 'n de novo konstruk gegenereer vanaf die metagenomiese volgorde data. Bioinformatika analises van die metagenomiese volgorde data het verder die teenwoordigheid van drie swam-infekterende virus families, die Chrysoviridae, Totiviridae en ongeklassifiseerde dubbelstring-RNS virus, Fusarium graminearum dsRNA mycovirus 4, aangetoon. 'n Virus van die Chrysoviridae familie, soortgelyk aan Penicillium chrysogenum virus, het die tweede meeste voorgekom in die wingerd monster. Hierdie studie demonstreer die suksesvolle toepassing van 'n kort reeks volgorde-bepalingstegnologie soos die Illumina platform, vir die opstel van 'n virusprofiel van 'n geinfekteerde wingerd. Sover ons kennis strek is hierdie die eerste aanwending van die Illumina tegnologie vir hierdie doel.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/5186
This item appears in the following collections: