Measuring the distance between multiple sequence alignments

被引:49
作者
Blackburne, Benjamin P. [1 ]
Whelan, Simon [1 ]
机构
[1] Univ Manchester, Fac Life Sci, Manchester M13 9PT, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
PROTEIN; RELIABILITY; ERRORS; HEADS; TAILS; MODEL; GAPS;
D O I
10.1093/bioinformatics/btr701
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. Results: We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them.
引用
收藏
页码:495 / 502
页数:8
相关论文
共 42 条
[1]   The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling [J].
Arnold, K ;
Bordoli, L ;
Kopp, J ;
Schwede, T .
BIOINFORMATICS, 2006, 22 (02) :195-201
[2]   Exploring the relationship between sequence similarity and accurate phylogenetic trees [J].
Cantarel, Brandi L. ;
Morrison, Hilary G. ;
Pearson, William .
MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (11) :2090-2100
[3]  
Deza E., 2009, ENCY DISTANCES
[4]   Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation [J].
Dickson, Russell J. ;
Wahl, Lindi M. ;
Fernandes, Andrew D. ;
Gloor, Gregory B. .
PLOS ONE, 2010, 5 (06)
[5]   ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340
[6]  
Eddy Sean R, 2009, Genome Inform, V23, P205
[7]   MUSCLE: a multiple sequence alignment method with reduced time and space complexity [J].
Edgar, RC .
BMC BIOINFORMATICS, 2004, 5 (1) :1-19
[8]   Multiple sequence alignment [J].
Edgar, Robert C. ;
Batzoglou, Serafim .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) :368-373
[9]   Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis [J].
Eisen, JA .
GENOME RESEARCH, 1998, 8 (03) :163-167
[10]  
Felsenstein J., 2003, Inferring phylogenies