Validation of Coevolving Residue Algorithms via Pipeline Sensitivity Analysis: ELSC and OMES and ZNMI, Oh My!

被引:28
作者
Brown, Christopher A. [1 ,3 ]
Brown, Kevin S. [2 ,4 ]
机构
[1] Harvard Univ, Dept Chem & Biol Chem, Cambridge, MA 02138 USA
[2] Univ Calif Santa Barbara, Dept Phys, Santa Barbara, CA 93106 USA
[3] Harvard Univ, FAS Ctr Syst Biol, Cambridge, MA 02138 USA
[4] Univ Calif Santa Barbara, Inst Collaborat Biotechnol, Santa Barbara, CA 93106 USA
来源
PLOS ONE | 2010年 / 5卷 / 06期
关键词
MULTIPLE SEQUENCE ALIGNMENT; CHORISMATE SYNTHASE; ALLOSTERIC COMMUNICATION; CORRELATED MUTATIONS; MUTUAL INFORMATION; PROTEINS; CONSERVATION; ACTIVATION; PREDICTION; NETWORKS;
D O I
10.1371/journal.pone.0010779
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correlated amino acid substitution algorithms attempt to discover groups of residues that co-fluctuate due to either structural or functional constraints. Although these algorithms could inform both ab initio protein folding calculations and evolutionary studies, their utility for these purposes has been hindered by a lack of confidence in their predictions due to hard to control sources of error. To complicate matters further, naive users are confronted with a multitude of methods to choose from, in addition to the mechanics of assembling and pruning a dataset. We first introduce a new pair scoring method, called ZNMI (Z-scored-product Normalized Mutual Information), which drastically improves the performance of mutual information for co-fluctuating residue prediction. Second and more important, we recast the process of finding coevolving residues in proteins as a data-processing pipeline inspired by the medical imaging literature. We construct an ensemble of alignment partitions that can be used in a cross-validation scheme to assess the effects of choices made during the procedure on the resulting predictions. This pipeline sensitivity study gives a measure of reproducibility (how similar are the predictions given perturbations to the pipeline?) and accuracy (are residue pairs with large couplings on average close in tertiary structure?). We choose a handful of published methods, along with ZNMI, and compare their reproducibility and accuracy on three diverse protein families. We find that (i) of the algorithms tested, while none appear to be both highly reproducible and accurate, ZNMI is one of the most accurate by far and (ii) while users should be wary of predictions drawn from a single alignment, considering an ensemble of sub-alignments can help to determine both highly accurate and reproducible couplings. Our cross-validation approach should be of interest both to developers and end users of algorithms that try to detect correlated amino acid substitutions.
引用
收藏
页数:14
相关论文
共 48 条
[1]  
[Anonymous], 2006, Elements of Information Theory
[2]  
[Anonymous], 1998, INTRO BOOTSTRAP
[3]   Optimal data collection for correlated mutation analysis [J].
Ashkenazy, Haim ;
Unger, Ron ;
Kliger, Yossef .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 74 (03) :545-555
[4]   Correlations among amino acid sites in bHLH protein domains: An information theoretic analysis [J].
Atchley, WR ;
Wollenberg, KR ;
Fitch, WM ;
Terhalle, W ;
Dress, AW .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (01) :164-178
[5]   A surprising simplicity to protein folding [J].
Baker, D .
NATURE, 2000, 405 (6782) :39-42
[6]   Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics [J].
Caporaso, J. Gregory ;
Smit, Sandra ;
Easton, Brett C. ;
Hunter, Lawrence ;
Huttley, Gavin A. ;
Knight, Rob .
BMC EVOLUTIONARY BIOLOGY, 2008, 8 (1)
[7]   Reassessing a sparse energetic network within a single protein domain [J].
Chi, Celestine N. ;
Elfstrom, Lisa ;
Shi, Yao ;
Snall, Tord ;
Engstrom, Ake ;
Jemth, Per .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (12) :4679-4684
[8]   Comparing community structure identification -: art. no. P09008 [J].
Danon, L ;
Díaz-Guilera, A ;
Duch, J ;
Arenas, A .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2005, :219-228
[9]   A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments [J].
Dekker, JP ;
Fodor, A ;
Aldrich, RW ;
Yellen, G .
BIOINFORMATICS, 2004, 20 (10) :1565-1572
[10]   Structure of chorismate synthase from Mycobacterium tuberculosis [J].
Dias, MVB ;
Borges, JC ;
Ely, F ;
Pereira, JH ;
Canduri, F ;
Ramos, CHI ;
Frazzon, J ;
Palma, MS ;
Basso, LA ;
Santos, DS ;
de Azevedo, WF .
JOURNAL OF STRUCTURAL BIOLOGY, 2006, 154 (02) :130-143