Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequence analysis

被引:58
作者
Lin, YL
Jiang, T [1 ]
Chao, KM
机构
[1] Univ Calif Riverside, Dept Comp Sci, Riverside, CA 92521 USA
[2] Providence Univ, Dept Comp Sci & Informat Management, Shalu 433, Taichung County, Taiwan
[3] Natl Yang Ming Univ, Dept Life Sci, Taipei 112, Taiwan
基金
美国国家科学基金会;
关键词
algorithm; efficiency; maximum consecutive subsequence; length constraint; biomolecular sequence analysis; ungapped local alignment;
D O I
10.1016/S0022-0000(02)00010-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We study two fundamental problems concerning the search for interesting, regions in sequences: (i) given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum and (ii) given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. We present an O(n)-time algorithm for the first problem and an O(n log L)-time algorithm for the second. The algorithms have potential applications in several areas of biomolecular sequence analysis including locating GC-rich regions in a genomic DNA sequence, post-processing sequence alignments, annotating multiple sequence alignments, and computing length-constrained ungapped local alignment. Our preliminary tests on both simulated and real data demonstrate that the algorithms are very efficient and able to locate useful (such as GC-rich) regions. (C) 2002 Published by Elsevier Science (USA).
引用
收藏
页码:570 / 586
页数:17
相关论文
共 24 条
[1]  
Alexandrov N N, 1998, Pac Symp Biocomput, P463
[2]   A new approach to sequence comparison:: normalired sequence alignment [J].
Arslan, AN ;
Egecioglu, Ö ;
Pevzner, PA .
BIOINFORMATICS, 2001, 17 (04) :327-337
[3]  
Bafna V, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P3
[4]  
Bentley J., 1986, PROGRAMMING PEARLS, VSecond
[5]  
BOGUSKI MS, 1992, NEW BIOL, V4, P247
[6]  
Cormen T. H., 1990, INTRO ALGORITHMS
[7]   CPG ISLANDS IN VERTEBRATE GENOMES [J].
GARDINERGARDEN, M ;
FROMMER, M .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 196 (02) :261-282
[8]   SEQUENCE AND COMPARATIVE-ANALYSIS OF THE RABBIT ALPHA-LIKE GLOBIN GENE-CLUSTER REVEALS A RAPID MODE OF EVOLUTION IN A G+C-RICH REGION OF MAMMALIAN GENOMES [J].
HARDISON, R ;
KRANE, D ;
VANDENBERGH, D ;
CHENG, JF ;
MANSBERGER, J ;
TADDIE, J ;
SCHWARTZ, S ;
HUANG, XQ ;
MILLER, W .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 222 (02) :233-249
[9]   Locus control regions of mammalian β-globin gene clusters:: combining phylogenetic analyses and experimental results to gain functional insights [J].
Hardison, R ;
Slightom, JL ;
Gumucio, DL ;
Goodman, M ;
Stojanovic, N ;
Miller, W .
GENE, 1997, 205 (1-2) :73-94
[10]  
HUANG XQ, 1994, COMPUT APPL BIOSCI, V10, P219