Optimizing multiple seeds for protein homology search

被引:8
作者
Brown, DG [1 ]
机构
[1] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
关键词
bioinformatics database applications; similarity measures; biology and genetics;
D O I
10.1109/TCBB.2005.13
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present a framework for improving local protein alignment algorithms. Specifically, we discuss how to extend local protein aligners to use a collection of vector seeds or ungapped alignment seeds to reduce noise hits. We model picking a set of seed models as an integer programming problem and give algorithms to choose such a set of seeds. While the problem is NP-hard, and Quasi-NP-hard to approximate to within a logarithmic factor, it can be solved easily in practice. A good set of seeds we have chosen allows four to five times fewer false positive hits, while preserving essentially identical sensitivity as BLASTP.
引用
收藏
页码:29 / 38
页数:10
相关论文
共 21 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[3]  
BREJOVA B, 2003, P 3 ANN WORKSH ALG B, P39
[4]  
BREJOVA B, 2005, J COMPUTER SYSTEM SC
[5]  
Brejova Brona, 2004, J Bioinform Comput Biol, V1, P595, DOI 10.1142/S0219720004000326
[6]  
BROWN D, 2004, P 4 ANN WORKSH ALG B, P170
[7]  
BROWN D, 2004, P 4 ANN WORKSH ALG B, P314
[8]  
BUHLER J, 2003, P 7 ANN INT C COMP M, P67
[9]   Good spaced seeds for homology search [J].
Choi, KP ;
Zeng, FF ;
Zhang, LX .
BIOINFORMATICS, 2004, 20 (07) :1053-1059
[10]   Sensitivity analysis and efficient method for identifying optimal spaced seeds [J].
Choi, KP ;
Zhang, LX .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2004, 68 (01) :22-40