DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment

被引:103
作者
Subramanian, AR
Weyer-Menkhoff, J
Kaufmann, M
Morgenstern, B
机构
[1] Univ Gottingen, Inst Microbiol & Genet, D-37077 Gottingen, Germany
[2] Univ Tubingen, Wilhelm Schickard Inst Informat, D-72076 Tubingen, Germany
关键词
D O I
10.1186/1471-2105-6-66
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: We present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally related sequence sets. However, it is often outperformed by these methods on data sets with global but weak similarity at the primary-sequence level. Results: In the present paper, we discuss strengths and weaknesses of DIALIGN in view of the underlying objective function. Based on these results, we propose several heuristics to improve the segment-based alignment approach. For pairwise alignment, we implemented a fragment-chaining algorithm that favours chains of low-scoring local alignments over isolated high-scoring fragments. For multiple alignment, we use an improved greedy procedure that is less sensitive to spurious local sequence similarities. To evaluate our method on globally related protein families, we used the well-known database BAliBASE. For benchmarking tests on locally related sequences, we created a new reference database called IRMBASE which consists of simulated conserved motifs implanted into non-related random sequences. Conclusion: On BAliBASE, our new program performs significantly better than the previous version of DIALIGN and is comparable to the standard global aligner CLUSTAL W, though it is outperformed by some newly developed programs that focus on global alignment. On the locally related test sets in IRMBASE, our method outperforms all other programs that we evaluated.
引用
收藏
页数:13
相关论文
共 29 条
[1]  
ABDEDDAIM S, 2001, LECT NOTES COMPUTER, V2066, P1
[2]   Fast and sensitive multiple alignment of large genomic sequences -: art. no. 66 [J].
Brudno, M ;
Chapman, M ;
Göttgens, B ;
Batzoglou, S ;
Morgenstern, B .
BMC BIOINFORMATICS, 2003, 4 (1)
[3]   Glocal alignment: finding rearrangements during alignment [J].
Brudno, Michael ;
Malde, Sanket ;
Poliakov, Alexander ;
Do, Chuong B. ;
Couronne, Olivier ;
Dubchak, Inna ;
Batzoglou, Serafim .
BIOINFORMATICS, 2003, 19 :i54-i62
[4]   MULTIPLE SEQUENCE ALIGNMENT WITH HIERARCHICAL-CLUSTERING [J].
CORPET, F .
NUCLEIC ACIDS RESEARCH, 1988, 16 (22) :10881-10890
[5]  
DEPIEREUX E, 1992, COMPUT APPL BIOSCI, V8, P501
[6]  
Do CB, 2004, PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, P703
[7]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[8]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[9]   Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments [J].
Gotoh, O .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 264 (04) :823-838
[10]   An assessment of gene prediction accuracy in large DNA sequences [J].
Guigó, R ;
Agarwal, P ;
Abril, JF ;
Burset, M ;
Fickett, JW .
GENOME RESEARCH, 2000, 10 (10) :1631-1642