Combining many multiple alignments in one improved alignment

被引:26
作者
Bucka-Lassen, K [1 ]
Caprani, O
Hein, J
机构
[1] Object Oriented Ltd, CH-6004 Luzern, Switzerland
[2] Aarhus Univ, Dept Comp Sci, DK-8000 Aarhus C, Denmark
[3] Aarhus Univ, Dept Ecol & Genet, DK-8000 Aarhus C, Denmark
关键词
D O I
10.1093/bioinformatics/15.2.122
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The fact that the multiple sequence alignment problem is of high complexity has led to many different heuristic algorithms attempting to find a solution in what would be considered a reasonable amount of computation time and space. Very few of these heuristics produce results that are guaranteed always to lie within a certain distance of an optimal solution (given a measure of quality, e.g. parsimony). Most practical heuristics cannot guarantee this, hut nevertheless perform well for certain cases. An alignment, obtained with one of these heuristics and with a bad overall score, is not unusable though, it might contain important information on how substrings should be aligned. This paper presents a method that extracts qualitatively good sub-alignments from a set of multiple alignments and combines these bite a neu: often improved alignment. The algorithm is implemented as a variant of the traditional dynamic programming technique. Results: An implementation of ComAlign (the algorithm that combines multiple alignments) has been run on several sets of artificially generated sequences and a set of 5S RNA sequences. To assess the quality of the alignments obtained the results have been compared with the output of MSA 2.1 (Gupta et al,, Proceedings of the Sixth Annual Symposium on Combinatorial Pattern Matching, 1995; Kececioglu et al., http://www.techfak.uni-bielefeld.de/bcd/Lectures/kececio-glu.html, 1995). In all cases, ComAlign was able to produce a solution with a score comparable to the solution obtained by MSA. The results also show that ComAlign actually does combine parts from different alignments and not just select the best of them.
引用
收藏
页码:122 / 130
页数:9
相关论文
共 11 条
[1]   A SURVEY OF MULTIPLE SEQUENCE COMPARISON METHODS [J].
CHAN, SC ;
WONG, AKC ;
CHIU, DKY .
BULLETIN OF MATHEMATICAL BIOLOGY, 1992, 54 (04) :563-598
[2]  
Dijkstra E. W., 1959, NUMER MATH, V1, P269, DOI DOI 10.1007/BF01386390
[3]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[4]  
FULLEN G, 1997, GENTLE GUIDE MULTIPL
[5]  
GUPTA SK, 1995, P 6 ANN S COMB PATT
[6]  
GUSFIELD D, 1997, MULTIPLE STRING COMP, pCH13
[7]  
KECECIOGLU JD, 1995, DISCUSSION THEME MSA
[8]  
MYERS EW, 1991, 9129 TR U AR DEP COM
[9]  
SANKOFF D, 1972, P NATL ACAD SCI USA, V68, P4
[10]  
WATERMAN MS, 1991, PHYLOGENETIC ANAL DN, pCH4