The use of phylogenetic reconstruction as a predictive tool to functionally identify raffinose family oligosaccharide (RFO) producing glycosyltransferases
Date
2022-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Carbohydrate active enzymes (CAZymes) are numerous and diverse enzymes that are involved with
the transport, synthesis, and catalysis of carbohydrates. All known and predicted CAZymes are
housed on the CAZy database (www.cazy.org). Two classes of CAZymes, the glycosyltransferases
(GTs) and glycosyl hydrolases (GHs) are important classes in the biosynthesis of a group of galacto-
oligosaccharides termed the raffinose family of oligosaccharides (RFOs). The RFOs are the most
widespread D-galactose (Gal) containing oligosaccharides in higher plants where they present a
number of vital natural functions including carbon transport and storage and amelioration of both
abiotic and biotic stresses. Recently, they have also emerged as powerful prebiotic agents, as they
provided usable carbon stimulating the growth of health beneficial gut microbes. Their biosynthesis
occurs through a distinct series of enzymatic reactions that begin with the biosynthesis of galactinol
(Gol) catalysed by the action of a galactinol synthase (GolS, GT8, EC 2.4.1.67). It is Gol that serves
as the galactosyl donor toward the biosynthesis of raffinose (Raf) and stachyose (Sta). These reactions
are catalysed by the GHs raffinose synthase (RafS, GH36, EC 2.4.1.82) and stachyose synthase (StaS,
GH36, EC 2.4.1.67), respectively. Numerous entries into genome databases and the CAZy repository,
which lack functional biochemical description are only putatively annotated according to sequence
similarities to orthologous gene sequences. Here, the use of orthologous genes to putatively annotate
proteins, specifically RFO synthesising enzymes, has led to inaccuracies in database records with
regards to the functional enzyme annotations – with many RFO related CAZymes putatively
annotated as being similar to GTs (involved in synthesis) and GHs (involved in hydrolysis).
Consequently, functional characterisations of RafSs and StaSs are historically underrepresented in
literature as they are difficult to identify – despite the extensive genome resource databases available
for numerous plants models. The emerging repurposing of phylogenetic reconstructions has shown
increased accuracy when annotating putative enzymes. Online resources such as SIFTER and
PhyloGenes (https://sifter.berkeley.edu/, http://www.phylogenes.org/) have the ability to use
phylogenetic trees as a means to accurately identify groupings of proteins which share functional
identities. In this study, we sought to use a phylogenetic reconstruction as a predictive tool toward
function, to identify RFO biosynthetic genes (RafS and StaS) from publicly available genome
resource databases where their functional annotations are either putative or unclear. We focused
largely to the newly established legume genome databases, using the known orthologues from
Arabidopsis RafS (AtRS5, At5G40390) and StaS (AtRS4, At4G01970) in BLASTn and BLASTp
searches, to identify candidate genes. We subsequently focused to key signatures in the amino acid
sequences of the candidate genes, including a hallmark 80 amino acid signature which represents a
potential functional discriminator between RafS and StaS proteins to carefully curate the candidate
genes. We then generated Maximum Likelihood and Bayesian Inference trees, rooting them against
Arabidopsis ATSIP2 (At3G56590), a known Raf hydrolysing alkaline α-galactosidase (α-Gal, EC
3.2.1.22.). Based on the outcomes of the trees, we selected two legume RafS candidates from barrel
medic (Medicago truncatula) and chickpea (Cicer arietinum). The coding sequences of these genes
were isolated, cloned into a bacterial expression vector and heterologously expressed in E. coli. Using
crude protein extracts, we then sought to determine if they demonstrated the ability to produce Raf,
when incubated in vitro in the presence of sucrose and galactinol. Using quantitative tandem mass
spectrometry (LC-MS/MS), we were not able to identify a distinct Raf producing capacity for either
gene candidate, nor was a recombinant protein produced when using the bacterial expression vector
pSF-OXB20 (constitutive promoter). However, the candidate RafS gene from M. truncatula was then
cloned into the pDEST17™ bacterial expression vector (arabinose inducible promoter) and we could
then identify Raf synthesis capacity in crude protein extracts. This provided some evidence toward
the validity of our phylogenetic reconstruction as this RafS gene candidate has an unclear functional
annotation in the genome resource databases for M. truncatula.
AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar.
AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar.
Description
Thesis (MScAgric)--Stellenbosch University, 2022.
Keywords
Raffinose family oligosaccharides, Glycosyltransferases, Raffinose synthase, Enzyme inhibitors, Phylogenetic reconstruction, Oligosaccharides -- Biotechnology, Carbohydrate active enzymes -- Analysis, Glycosyltransferases -- Utilization -- Synthesis, UCTD