MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities

被引:159
作者
Liu, Yongchao [1 ]
Schmidt, Bertil [1 ]
Maskell, Douglas L. [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
关键词
PROTEIN-STRUCTURE; DATABASE; BENCHMARK; ALGORITHM; ACCURACY; BALIBASE; STRATEGY; COFFEE; MUSCLE; MAFFT;
D O I
10.1093/bioinformatics/btq338
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate multiple alignments is still a challenge. Results: We present MSAProbs, a new and practical multiple alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Furthermore, two critical bioinformatics techniques, namely weighted probabilistic consistency transformation and weighted profile-profile alignment, are incorporated to improve alignment accuracy. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. Furthermore, MSAProbs is optimized for multi-core CPUs by employing amulti-threaded design, leading to a competitive execution time compared to other aligners.
引用
收藏
页码:1958 / 1964
页数:7
相关论文
共 41 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations [J].
Bahr, A ;
Thompson, JD ;
Thierry, JC ;
Poch, O .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :323-326
[3]   A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) :327-337
[4]  
BERGER MP, 1991, COMPUT APPL BIOSCI, V7, P479
[5]   OPTIMAL PROTEIN-STRUCTURE ALIGNMENTS BY MULTIPLE LINKAGE CLUSTERING - APPLICATION TO DISTANTLY RELATED PROTEINS [J].
BOUTONNET, NS ;
ROOMAN, MJ ;
OCHAGAVIA, ME ;
RICHELLE, J ;
WODAK, SJ .
PROTEIN ENGINEERING, 1995, 8 (07) :647-662
[6]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[7]   MULTIPLE SEQUENCE ALIGNMENT WITH HIERARCHICAL-CLUSTERING [J].
CORPET, F .
NUCLEIC ACIDS RESEARCH, 1988, 16 (22) :10881-10890
[8]   ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340
[9]  
Durbin R., 1998, Biological sequence analysis: probabilistic models of proteins and nucleic acids
[10]  
Edgar R. C., 2010, NUCL ACIDS RES