Fast optimization of statistical potentials for structurally constrained phylogenetic models

被引:2
作者
Bonnard, Cecile [1 ,2 ]
Kleinman, Claudia L. [2 ]
Rodrigue, Nicolas [3 ]
Lartillot, Nicolas [2 ]
机构
[1] LIRMM, Dept Informat, F-34392 Montpellier 5, France
[2] Univ Montreal, Dept Biochim, Montreal, PQ H3C 3J7, Canada
[3] Univ Ottawa, Dept Biol, Ottawa, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
SUBSTITUTION RATES; CODON-SUBSTITUTION; TERTIARY STRUCTURE; PROTEIN EVOLUTION; SEQUENCES; DISTRIBUTIONS; DEPENDENCE; MECHANICS; ALGORITHM; DESIGN;
D O I
10.1186/1471-2148-9-227
中图分类号
Q [生物科学];
学科分类号
090105 [作物生产系统与生态工程];
摘要
Background: Statistical approaches for protein design are relevant in the field of molecular evolutionary studies. In recent years, new, so-called structurally constrained (SC) models of protein-coding sequence evolution have been proposed, which use statistical potentials to assess sequence-structure compatibility. In a previous work, we defined a statistical framework for optimizing knowledge-based potentials especially suited to SC models. Our method used the maximum likelihood principle and provided what we call the joint potentials. However, the method required numerical estimations by the use of computationally heavy Markov Chain Monte Carlo sampling algorithms. Results: Here, we develop an alternative optimization procedure, based on a leave-one-out argument coupled to fast gradient descent algorithms. We assess that the leave-one-out potential yields very similar results to the joint approach developed previously, both in terms of the resulting potential parameters, and by Bayes factor evaluation in a phylogenetic context. On the other hand, the leave-one-out approach results in a considerable computational benefit (up to a 1,000 fold decrease in computational time for the optimization procedure). Conclusion: Due to its computational speed, the optimization method we propose offers an attractive alternative for the design and empirical evaluation of alternative forms of potentials, using large data sets and high-dimensional parameterizations.
引用
收藏
页数:13
相关论文
共 42 条
[1]
PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS [J].
ANFINSEN, CB .
SCIENCE, 1973, 181 (4096) :223-230
[2]
A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank [J].
Bastolla, Ugo ;
Porto, Markus ;
Roman, H. Eduardo ;
Vendruscolo, Michele .
BMC EVOLUTIONARY BIOLOGY, 2006, 6 (1)
[3]
A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[4]
Case DA., 2008, AMBER 10 University of California
[5]
Optimizing potentials for the inverse protein folding problem [J].
Chiu, TL ;
Goldstein, RA .
PROTEIN ENGINEERING, 1998, 11 (09) :749-752
[6]
Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences [J].
Choi, Sang Chul ;
Redelings, Benjamin D. ;
Thorne, Jeffrey L. .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2008, 363 (1512) :3931-3939
[7]
Quantifying the impact of protein tertiary structure on molecular evolution [J].
Choi, Sang Chul ;
Hobolth, Asger ;
Robinson, Douglas M. ;
Kishino, Hirohisa ;
Thorne, Jeffrey L. .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (08) :1769-1782
[8]
New algorithm for protein design [J].
Deutsch, JM ;
Kurosky, T .
PHYSICAL REVIEW LETTERS, 1996, 76 (02) :323-326
[9]
A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood [J].
Guindon, S ;
Gascuel, O .
SYSTEMATIC BIOLOGY, 2003, 52 (05) :696-704
[10]
Evolutionary distances for protein-coding sequences: Modeling site-specific residue frequencies [J].
Halpern, AL ;
Bruno, WJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1998, 15 (07) :910-917