Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation

被引:17
作者
Dickson, Russell J. [1 ]
Wahl, Lindi M. [2 ]
Fernandes, Andrew D. [1 ,2 ]
Gloor, Gregory B. [1 ]
机构
[1] Univ Western Ontario, Dept Biochem, London, ON, Canada
[2] Univ Western Ontario, Dept Appl Math, London, ON N6A 5B9, Canada
来源
PLOS ONE | 2010年 / 5卷 / 06期
关键词
COEVOLVING RESIDUES; COEVOLUTION; INFORMATION; IDENTIFICATION; PHYLOGENY; ACCURACY; SYSTEM;
D O I
10.1371/journal.pone.0011082
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. Methodology/Principal Findings: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. Conclusions/Significance: Protein alignments with errors lead to false positive and false negative conclusions ( incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.
引用
收藏
页数:11
相关论文
共 31 条
[1]   Correlations among amino acid sites in bHLH protein domains: An information theoretic analysis [J].
Atchley, WR ;
Wollenberg, KR ;
Fitch, WM ;
Terhalle, W ;
Dress, AW .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (01) :164-178
[2]   Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction [J].
Dunn, S. D. ;
Wahl, L. M. ;
Gloor, G. B. .
BIOINFORMATICS, 2008, 24 (03) :333-340
[3]   Multiple sequence alignment [J].
Edgar, Robert C. ;
Batzoglou, Serafim .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) :368-373
[4]   A novel method for detecting intramolecular coevolution: Adding a further dimension to selective constraints analyses [J].
Fares, Mario A. ;
Travers, Simon A. A. .
GENETICS, 2006, 173 (01) :9-23
[5]   AN IMPROVED METHOD FOR DETERMINING CODON VARIABILITY IN A GENE AND ITS APPLICATION TO RATE OF FIXATION OF MUTATIONS IN EVOLUTION [J].
FITCH, WM ;
MARKOWITZ, E .
BIOCHEMICAL GENETICS, 1970, 4 (05) :579-+
[6]   Advances in protein structure prediction and de novo protein design:: A review [J].
Floudas, CA ;
Fung, HK ;
McAllister, SR ;
Mönnigmann, M ;
Rajgaria, R .
CHEMICAL ENGINEERING SCIENCE, 2006, 61 (03) :966-988
[7]   Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions [J].
Gloor, GB ;
Martin, LC ;
Wahl, LM ;
Dunn, SD .
BIOCHEMISTRY, 2005, 44 (19) :7156-7165
[8]  
GLOOR GB, 2010, MOL BIOL EVOL
[9]   Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments [J].
Gotoh, O .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 264 (04) :823-838
[10]   Cn3D: a new generation of three-dimensional molecular structure viewer [J].
Hogue, CWV .
TRENDS IN BIOCHEMICAL SCIENCES, 1997, 22 (08) :314-316