Browsing by Author "Foster, Bryce John"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemHeterologous expression and partial characterisation of enzymes predicted in silico by deep feed-forward neural networks(Stellenbosch : Stellenbosch University, 2024-03) Foster, Bryce John; Patterton, Hugh-George; Stellenbosch University. Faculty of Science. Dept. of Biochemistry.ENGLISH ABSTRACT: Inverse protein folding (IPF) involves the prediction of amino acid sequences that will fold into a specified three-dimensional (3D) structure. The implementation of advanced algorithms has expedited progress in structural bioinformatics, with increasing application of machine learning (ML) approaches. This has directly contributed to the unparalleled success of modern protein structure prediction methods that are capable of predicting the 3D atomic coordinates of complex structures from only their protein sequences with near-experimental accuracy. Comparable progress has been seen for reverse folding predictors, which are now largely reliant on ever-evolving neural network architectures. IPF tools that are guided by the physical principles of protein folding offer insight and potential for rational protein design and enzyme engineering that have long been unattainable. However, the outputs of many of these tools have only been assessed in silico by means of ML folding algorithms—consequently, a gap exists between their potential and realised application. SeqPredNN is an in-house feed-forward IPF neural network trained using features extracted from the subset of Protein Data Bank entries that have less than 90% identity. Similarly, ProteinMPNN is a deep learning-based protein sequence design method. Unlike ProteinMPNN, SeqPredNN is yet to be applied in vitro, and therefore lacks experimental validation. This study used both SeqPredNN and ProteinMPNN to predict novel protein sequences for Bacillus subtilis lipase A (LipA) and Streptomyces griseus trypsin (SGT). The SeqPredNN sequence recovery rates were <40%, while the ProteinMPNN predictions were >60% identical to the native sequences. Following the re-introduction of all residues deemed necessary for catalysis, the curated sequences were folded using AlphaFold2. The resulting conformations for both IPF tools were remarkably similar to the native X-ray crystal structures. Further molecular dynamics simulations and ligand docking showed that, despite vastly different amino acid sequences, the IPF enzymes were expected to possess physicochemical properties largely comparable to those of the native counterparts. To validate this experimentally, the novel proteins were produced in recombinant Escherichia coli and Pichia pastoris strains. Noticeably lower levels of heterologous protein expression were observed for the IPF variants, particularly for SeqPredNN LipA, compared to the native proteins. Furthermore, catalytic activity was significantly reduced or completely lost in the predicted enzymes. This is likely due to modifications of the electrostatic surface potential and active site topology that no longer facilitate correct substrate interactions, which are deemed to be repercussions of the highly unique protein sequences. Preliminary characterisation via circular dichroism yielded empirical secondary structure compositions that differed from the expected values, adding to the lack of congruence between the computational and experimental results. Importantly, this study provides the first experimental implementation of SeqPredNN, and emphasises critical considerations for difficult protein design targets. However, the ultimate utility of a reverse folding tool lies in its ability to predict protein sequences that will achieve structures capable of performing intended functions. Accordingly, while these tools offer an unprecedented starting point for rational protein modification, a more holistic approach with continued experimental validation may be required for greater success.