MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities

被引：159

作者：

Liu, Yongchao ^{[1
]}

Schmidt, Bertil ^{[1
]}

Maskell, Douglas L. ^{[1
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore

来源：

BIOINFORMATICS | 2010年 / 26卷 / 16期

关键词：

PROTEIN-STRUCTURE; DATABASE; BENCHMARK; ALGORITHM; ACCURACY; BALIBASE; STRATEGY; COFFEE; MUSCLE; MAFFT;

D O I：

10.1093/bioinformatics/btq338

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate multiple alignments is still a challenge. Results: We present MSAProbs, a new and practical multiple alignment algorithm for protein sequences. The design of MSAProbs is based on a combination of pair hidden Markov models and partition functions to calculate posterior probabilities. Furthermore, two critical bioinformatics techniques, namely weighted probabilistic consistency transformation and weighted profile-profile alignment, are incorporated to improve alignment accuracy. Assessed using the popular benchmarks: BAliBASE, PREFAB, SABmark and OXBENCH, MSAProbs achieves statistically significant accuracy improvements over the existing top performing aligners, including ClustalW, MAFFT, MUSCLE, ProbCons and Probalign. Furthermore, MSAProbs is optimized for multi-core CPUs by employing amulti-threaded design, leading to a competitive execution time compared to other aligners.

引用

页码：1958 / 1964

页数：7

共 41 条

[1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].

Altschul, SF ;

Madden, TL ;

Schaffer, AA ;

Zhang, JH ;

Zhang, Z ;

Miller, W ;

Lipman, DJ .

NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402

[2] BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations [J].

Bahr, A ;

Thompson, JD ;

Thierry, JC ;

Poch, O .

NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :323-326

[3] A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS [J].

BARTON, GJ ;

STERNBERG, MJE .

JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) :327-337

[4]

BERGER MP, 1991, COMPUT APPL BIOSCI, V7, P479

[5] OPTIMAL PROTEIN-STRUCTURE ALIGNMENTS BY MULTIPLE LINKAGE CLUSTERING - APPLICATION TO DISTANTLY RELATED PROTEINS [J].

BOUTONNET, NS ;

ROOMAN, MJ ;

OCHAGAVIA, ME ;

RICHELLE, J ;

WODAK, SJ .

PROTEIN ENGINEERING, 1995, 8 (07) :647-662

[6] The ASTRAL compendium for protein structure and sequence analysis [J].

Brenner, SE ;

Koehl, P ;

Levitt, R .

NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256

[7] MULTIPLE SEQUENCE ALIGNMENT WITH HIERARCHICAL-CLUSTERING [J].

CORPET, F .

NUCLEIC ACIDS RESEARCH, 1988, 16 (22) :10881-10890

[8] ProbCons: Probabilistic consistency-based multiple sequence alignment [J].

Do, CB ;

Mahabhashyam, MSP ;

Brudno, M ;

Batzoglou, S .

GENOME RESEARCH, 2005, 15 (02) :330-340

[9]

Durbin R., 1998, Biological sequence analysis: probabilistic models of proteins and nucleic acids

[10]

Edgar R. C., 2010, NUCL ACIDS RES

← 1 2 3 4 5 →