Optimizing multiple seeds for protein homology search

被引:8
作者
Brown, DG [1 ]
机构
[1] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
关键词
bioinformatics database applications; similarity measures; biology and genetics;
D O I
10.1109/TCBB.2005.13
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present a framework for improving local protein alignment algorithms. Specifically, we discuss how to extend local protein aligners to use a collection of vector seeds or ungapped alignment seeds to reduce noise hits. We model picking a set of seed models as an integer programming problem and give algorithms to choose such a set of seeds. While the problem is NP-hard, and Quasi-NP-hard to approximate to within a logarithmic factor, it can be solved easily in practice. A good set of seeds we have chosen allows four to five times fewer false positive hits, while preserving essentially identical sensitivity as BLASTP.
引用
收藏
页码:29 / 38
页数:10
相关论文
共 21 条
[11]  
Chung YY, 2004, P INT COMP SOFTW APP, P54
[12]  
CSUROS M, 2004, P 15 ANN S COMB PATT, P373
[13]   A threshold of in n for approximating set cover [J].
Feige, U .
JOURNAL OF THE ACM, 1998, 45 (04) :634-652
[14]   On spaced seeds for similarity search [J].
Keich, U ;
Li, M ;
Ma, B ;
Tromp, J .
DISCRETE APPLIED MATHEMATICS, 2004, 138 (03) :253-263
[15]  
KISMAN D, 2004, BIOINFORMATICS
[16]   Estimating seed sensitivity on homogeneous alignments [J].
Kucherov, G ;
Noé, L ;
Ponty, Y .
BIBE 2004: FOURTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2004, :387-394
[17]  
KUCHEROV G, 2004, P 15 ANN COMB PATT M, P297
[18]  
LI M, 2004, J BIOINF COMPUT BIOL, V2, P419
[19]   PatternHunter: faster and more sensitive homology search [J].
Ma, B ;
Tromp, J ;
Li, M .
BIOINFORMATICS, 2002, 18 (03) :440-445
[20]   IDENTIFICATION OF COMMON MOLECULAR SUBSEQUENCES [J].
SMITH, TF ;
WATERMAN, MS .
JOURNAL OF MOLECULAR BIOLOGY, 1981, 147 (01) :195-197