A simple and space-efficient fragment-chaining algorithm for alignment of DNA and protein sequences

被引:32
作者
Morgenstern, B [1 ]
机构
[1] GSF Natl Res Ctr Environm & Hlth, Inst Biomath & Biometry, D-85764 Neuherberg, Germany
关键词
sequence alignment; string algorithm; fragment chaining; dynamic programming; complexity;
D O I
10.1016/S0893-9659(01)00085-4
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In the segment-based approach to sequence alignment, nucleic acid, and protein sequence alignments are constructed from fragments, i.e., from pairs of ungapped segments of the input sequences. Given a set F of candidate fragments and a weighting function w : F --> R-0(+), the score of an alignment is defined as the sum of weights of the fragments it consists of, and the optimization problem is to find a consistent collection of pairwise disjoint fragments with maximum sum of weights. Herein, a sparse dynamic programming algorithm is described that solves the pairwise segment-alignment problem in O(L + N-max) space where L is the maximum length of the input sequences while N-max less than or equal to #F holds. With a recently introduced weighting function w, small sets F of candidate fragments are sufficient to obtain alignments of high quality, As a result, the proposed algorithm runs in essentially linear space. (C) 2001 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:11 / 16
页数:6
相关论文
共 14 条
[11]   OPTIMAL ALIGNMENTS IN LINEAR-SPACE [J].
MYERS, EW ;
MILLER, W .
COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1988, 4 (01) :11-17
[12]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[13]   BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs [J].
Thompson, JD ;
Plewniak, F ;
Poch, O .
BIOINFORMATICS, 1999, 15 (01) :87-88
[14]   RAPID SIMILARITY SEARCHES OF NUCLEIC-ACID AND PROTEIN DATA BANKS [J].
WILBUR, WJ ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1983, 80 (03) :726-730