Superiority and Complexity of the Spaced Seeds

被引:28
作者
Li, Ming [1 ]
Ma, Bin [2 ]
Zhang, Louxin [3 ]
机构
[1] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
[2] Univ Western Ontario, Dept Comp Sci, London, ON N6A 5B8, Canada
[3] Natl Univ Singapore, Dept Math, Singapore 117543, Singapore
来源
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS | 2006年
关键词
D O I
10.1145/1109557.1109607
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Optimal spaced seeds were introduced by the theoretical computer science community to bioinformatics to effectively increase homology search sensitivity. They are now serving thousands of homology search queries daily. While dozens of papers have been published on optimal spaced seeds since their invention, many fundamental questions still remain unanswered. In this paper, we settle several open questions in this area. Specifically, we prove that when the length of a non-uniformly spaced seed is bounded by an exponential function of the seed weight, the seed outperforms strictly the traditional consecutive seed in both (i) the average number of non-overlapping hits and (ii) the asymptotic hit probability. Then, we study the computation of the hit probability of a spaced seed, solving three more open questions: (iii) hit probability computation in a uniform homologous region is NP-hard and (iv) it admits a PTAS; (v) the asymptotic hit probability is computable in exponential time in seed length, independent of the homologous region length.
引用
收藏
页码:444 / +
页数:2
相关论文
共 34 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
BALAKRISHNAN N, 2002, RUNS SCANS APPLICATI
[4]  
Birkhoff Garrett, 1957, T AM MATH SOC, V85, P219, DOI 10.2307/1992971
[5]  
BROWN BD, 2004, J BIOINF COMPUT BIOL, V1, P595
[6]  
BULLLER J, 2007, P 7 ANN INT C COMP F, P67
[7]  
CAREY MR, 1979, GUIDE THEORY NP COMP
[8]   Good spaced seeds for homology search [J].
Choi, KP ;
Zeng, FF ;
Zhang, LX .
BIOINFORMATICS, 2004, 20 (07) :1053-1059
[9]   Sensitivity analysis and efficient method for identifying optimal spaced seeds [J].
Choi, KP ;
Zhang, LX .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2004, 68 (01) :22-40
[10]  
Chung YY, 2004, P INT COMP SOFTW APP, P54