Multiseed lossless filtration

被引:44
作者
Kucherov, G
Noé, L
Roytberg, M
机构
[1] INRIA, LORIA, F-54602 Villers Les Nancy, France
[2] Inst Math Problems Biol, Pushchino, Moscow Region, Russia
基金
俄罗斯基础研究基金会;
关键词
filtration; string matching; gapped seed; gapped q-gram; local alignment; sequence similarity; seed family; multiple spaced seeds; dynamic programming; EST; oligonucleotide selection;
D O I
10.1109/TCBB.2005.12
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt and Karkkainen [1]. We present algorithms to compute several important parameters of seed families, study their combinatorial properties, and describe several techniques to construct efficient families. We also report a large-scale application of the proposed technique to the problem of oligonucleotide selection for an EST sequence database.
引用
收藏
页码:51 / 61
页数:11
相关论文
共 25 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], INT WORKSHOP UTILITY, DOI [DOI 10.1145/565196.565208], DOI 10.1145/1089827.1089839]
[3]  
[Anonymous], 2002, FLEXIBLE PATTERN MAT
[4]  
BREJOVA B, 2003, P 3 ANN WORKSH ALG B, P39
[5]  
BROWN D, 2004, P 4 ANN WORKSH ALG B, P170
[6]  
BUHLER J, 2003, P 7 ANN INT C COMP M, P67
[7]  
Burkhardt S, 2002, LECT NOTES COMPUT SC, V2373, P225
[8]  
Burkhardt S, 2003, FUND INFORM, V56, P51
[9]  
Califano A, 1993, Proc Int Conf Intell Syst Mol Biol, V1, P56
[10]   Sensitivity analysis and efficient method for identifying optimal spaced seeds [J].
Choi, KP ;
Zhang, LX .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2004, 68 (01) :22-40