Correcting the bias of empirical frequency parameter estimators in codon models

dc.contributor.authorKosakovsky Pond, Sergei
dc.contributor.authorDelport, Wayne
dc.contributor.authorMuse, Spencer V.
dc.contributor.authorScheffler, Konrad
dc.date.accessioned2013-02-21T15:53:20Z
dc.date.available2013-02-21T15:53:20Z
dc.date.issued2010-07
dc.descriptionThe original publication is available at http://www.plosone.org/en_ZA
dc.description.abstractMarkov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a ‘‘corrected’’ empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard F3|4 estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of 856 sequence alignments, our estimators show a significant improvement in goodness of fit compared to the F3|4 approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the F3|4-style estimators.en_ZA
dc.description.sponsorshipJoint Division of Mathematical Sciences/National Institute of General Medical Sciences Mathematical Biology Initiativeen_ZA
dc.description.sponsorshipNational Institutes of Healthen_ZA
dc.description.sponsorshipSan Diego Center for AIDS Researchen_ZA
dc.description.sponsorshipNIAID Developmental Awarden_ZA
dc.description.versionPublisher's versionen_ZA
dc.format.extent5 p. ; ill.
dc.identifier.citationKosakovsky Pond, S., Delport, W., Muse, S.V. & Scheffler, K. 2010. Correcting the bias of empirical frequency parameter estimators in Codon Models. PLoS ONE, 5(7): e11230, doi:10.1371/journal.pone.0011230.en_ZA
dc.identifier.issn1932-6203 (online)
dc.identifier.otherdoi:10.1371/journal.pone.0011230
dc.identifier.urihttp://hdl.handle.net/10019.1/79596
dc.language.isoen_ZAen_ZA
dc.publisherPublic Library of Science -- PLOSen_ZA
dc.rights.holderAuthors retain copyrighten_ZA
dc.subjectMarkov processesen_ZA
dc.subjectbiological processesen_ZA
dc.subjectGoodness of fiten_ZA
dc.subjectNucleotide countsen_ZA
dc.subjectCodon substitution modelsen_ZA
dc.titleCorrecting the bias of empirical frequency parameter estimators in codon modelsen_ZA
dc.typeArticleen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
kosakovskypond_correcting_2010.pdf
Size:
326.44 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.95 KB
Format:
Item-specific license agreed upon to submission
Description: