A general method for fast multiple sequence alignment

被引:26
作者
Tonges, U [1 ]
Perrey, SW [1 ]
Stoye, J [1 ]
Dress, AWM [1 ]
机构
[1] MASSEY UNIV,DEPT MATH,PALMERSTON NORTH,NEW ZEALAND
来源
GENE-COMBIS | 1996年 / 172卷
关键词
dynamic programming; secondary matrix; divide and conquer; pair-wise sequence alignment; multiple sequence alignment; PROTEIN SEQUENCES; ACID SEQUENCES; LINEAR-SPACE; ALGORITHM; CORRECT;
D O I
10.1016/0378-1119(96)00123-0
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have developed a fast heuristic algorithm for multiple sequence alignment which provides near-to-optimal results for sufficiently homologous sequences. The algorithm makes use of the standard dynamic programming procedure by applying it to all pairs of sequences. The resulting score matrices for pair-wise alignment give rise to secondary matrices containing the additional charges imposed by forcing the alignment path to run through a particular vertex Such a constraint corresponds to slicing the sequences at the positions defining that vertex, and aligning the remaining pairs of prefix and suffix sequences separately. From these secondary matrices, one can compute - for any given family of sequences - suitable positions for cutting all of these sequences simultaneously, thus reducing the problem of aligning a family of n sequences of average length I in a Divide and Conquer fashion to aligning two families of n sequences of approximately half that length. In this paper, we explain the method for the case of 3 sequences in detail, and we demonstrate its potential and its limits by discussing its behaviour for several test families. A generalization for aligning more than 3 sequences is lined out, and some actual alignments constructed by our algorithm for various user-defined parameters are presented.
引用
收藏
页码:GC33 / GC41
页数:9
相关论文
共 38 条
[1]   A FAST ALGORITHM FOR THE OPTIMAL ALIGNMENT OF 3 STRINGS [J].
ALLISON, L .
JOURNAL OF THEORETICAL BIOLOGY, 1993, 164 (02) :261-269
[2]   GAP COSTS FOR MULTIPLE SEQUENCE ALIGNMENT [J].
ALTSCHUL, SF .
JOURNAL OF THEORETICAL BIOLOGY, 1989, 138 (03) :297-309
[3]   WEIGHTS FOR DATA RELATED BY A TREE [J].
ALTSCHUL, SF ;
CARROLL, RJ ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1989, 207 (04) :647-653
[4]  
[Anonymous], 1995, Introduction to computational biology: maps, sequences and genomes
[5]  
BAFNA V, 1994, P 5 S COMB PATT MATC, V807, P43
[6]   MEDIANS IN MEDIAN GRAPHS [J].
BANDELT, HJ ;
BARTHELEMY, JP .
DISCRETE APPLIED MATHEMATICS, 1984, 8 (02) :131-142
[7]  
CARILLO H, 1988, SIAM J APPL MATH, V48, P1073
[8]  
CHAN SC, 1992, B MATH BIOL, V54, P563, DOI 10.1016/S0092-8240(05)80077-1
[9]  
DRESS AWM, 1995, P 3 C INT SYST MOL B, P107
[10]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360