Multiple DNA and protein sequence alignment based on segment-to-segment comparison

被引:208
作者
Morgenstern, B [1 ]
Dress, A [1 ]
Werner, T [1 ]
机构
[1] UNIV BIELEFELD, FAK MATH, D-33501 BIELEFELD, GERMANY
关键词
sequence similarity; partial alignments; fragment comparison; dynamic programming; functional-site identification;
D O I
10.1073/pnas.93.22.12098
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper, a new way to think about, and to construct, pairwise as well as multiple alignments of DNA and protein sequences is proposed. Rather than forcing alignments to either align single residues or to introduce gaps by defining an alignment as a path running right from the source up to the sink in the associated dot-matrix diagram, we propose to consider alignments as consistent equivalence relations defined on the set of all positions occurring in all sequences under consideration. We also propose constructing alignments from whole segments exhibiting highly significant overall similarity rather than by aligning individual residues, Consequently, we present an alignment algorithm that (i) is based on segment-to-segment comparison instead of the commonly used residue-to-residue comparison and which (ii) avoids the well-known difficulties concerning the choice of appropriate gap penalties: gaps are not treated explicitly, but remain as those parts of the sequences that do not belong to any of the aligned segments. Finally, we discuss the application of our algorithm to two test examples and compare it with commonly used alignment methods. As a first example, we aligned a set of 11 DNA sequences coding for functional helix-loop-helix proteins. Though the sequences show only low overall similarity, our program correctly aligned all of the 11 functional sites, which was a unique result among the methods tested, As a by-product, the reading frames of the sequences were identified. Next, we aligned a set of ribonuclease EI proteins and compared our results with alignments produced by other programs as reported by McClure et al. [McClure, M. A., Vasi, T. K. & Fitch, W. M. (1994) Mol. Biol. Evol. 11, 571-592]. Our program was one of the best scoring programs. However, in contrast to other methods, our protein alignments are independent of user-defined parameters.
引用
收藏
页码:12098 / 12103
页数:6
相关论文
共 20 条
[1]   TREES, STARS, AND MULTIPLE BIOLOGICAL SEQUENCE ALIGNMENT [J].
ALTSCHUL, SF ;
LIPMAN, DJ .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1989, 49 (01) :197-209
[2]   A SENSITIVE PROCEDURE TO COMPARE AMINO-ACID-SEQUENCES [J].
ARGOS, P .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (02) :385-396
[3]  
ARGOS P, 1990, METHOD ENZYMOL, V183, P352
[4]   THE MULTIPLE SEQUENCE ALIGNMENT PROBLEM IN BIOLOGY [J].
CARRILLO, H ;
LIPMAN, D .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1988, 48 (05) :1073-1082
[5]   DISCRIMINATION BETWEEN RELATED DNA SITES BY A SINGLE AMINO-ACID RESIDUE OF MYC-RELATED BASIC HELIX LOOP HELIX PROTEINS [J].
DANG, CV ;
DOLDE, C ;
GILLISON, ML ;
KATO, GJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (02) :599-602
[6]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[7]  
GOTOH O, 1993, COMPUT APPL BIOSCI, V9, P361
[8]  
HIGGINS DG, 1992, COMPUT APPL BIOSCI, V8, P189
[9]  
HIGGINS DG, 1989, COMPUT APPL BIOSCI, V5, P151
[10]   A METHOD FOR THE SIMULTANEOUS ALIGNMENT OF 3 OR MORE AMINO-ACID-SEQUENCES [J].
JOHNSON, MS ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1986, 23 (03) :267-278