Browsing by Author "Zimire, Darryn"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemSimulating read length, sequencing depth and base-call quality for RNAsequencing experimental design(Stellenbosch : Stellenbosch University, 2021-12) Zimire, Darryn; Tromp, Gerard; Stellenbosch University. Faculty of Medicine and Health Sciences. Dept. of Biomedical Sciences: Molecular Biology and Human Genetics.ENGLISH ABSTRACT: RNA-sequencing (RNA-seq) is a quantitative high-throughput sequencing biotechnology developed to analyse and provide insights into the molecular biology of the transcriptome. An appropriate experimental design and analysis strategy for RNA-seq experiments is essential and requires statistical methods suited to model the characteristics of sequencing data which take the form of a matrix with the number of reads per genomic feature as a digital estimate of relative expression. Sequencing depth, read length and data quality are of particular importance for planning and analysing RNA-seq experiments as these factors can be decided before conducting the experiment. The number of reads generated for a particular experiment affects the statistical power to make biological conclusions. Read length coupled with its associated quality influences the mappability of the sequencing data and in turn has an impact on information loss. Shorter reads tend to map to multiple locations when aligned to the reference genome or transcriptome. The quality of the data also affects the downstream analysis and can result in the discarding of data, diminishing the ability to establish biological insights with confidence from the experimental data. To assist in the design of RNA-seq experiments, I present an RNA-seq data simulator (RSDS), which is a proof-of-concept computer simulator written in the Python programming language for raw RNA- seq data simulations. RSDS allows for simulation of both single-end and paired-end RNA-seq data with sequencing depth, read length, and base-call quality as tuneable settings. A two-group differential expression experiment can be simulated using RSDS. I describe, validate and implement the RSDS simulator and demonstrate its use for generation of raw synthetic RNA-seq data by varying the parameter values of sequencing depth, read length, and base-call quality. I demonstrate the ability of RSDS to reproduce a transcript expression profile from an input matrix of read counts derived from a real RNA-seq experiment and produce a two-group differential experiment with varying fold-changes and expression levels.