Good spaced seeds for homology search

被引:46
作者
Choi, KP [1 ]
Zeng, FF
Zhang, LX
机构
[1] Natl Univ Singapore, Dept Math, Singapore 117543, Singapore
[2] Natl Univ Singapore, Dept Stat & Appl Probabil, Singapore 117543, Singapore
[3] Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore
关键词
D O I
10.1093/bioinformatics/bth037
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Filtration is an important technique used to speed up local alignment as exemplified in the BLAST programs. Recently, Ma et al. discovered that better filtering can be achieved by spacing out the matching positions according to a certain pattern, instead of contiguous positions to trigger a local alignment in their PatternHunter program. Such a match pattern is called a spaced seed. Results: Our numerical computation shows that the ranks of spaced seeds (based on sensitivity) change with the sequences similarity. Since homologous sequences may have diverse similarity, we assess the sensitivity of spaced seeds over a range of similarity levels and present a list of good spaced seeds for facilitating homology search in DNA genomic sequences. We validate that the listed spaced seeds are indeed more sensitive using three arbitrarily chosen pairs of DNA genomic sequences.
引用
收藏
页码:1053 / 1059
页数:7
相关论文
共 26 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
Brejová B, 2003, LECT NOTES COMPUT SC, V2676, P42
[4]   Efficient large-scale sequence comparison by locality-sensitive hashing [J].
Buhler, J .
BIOINFORMATICS, 2001, 17 (05) :419-428
[5]  
BUHLER J, 2003, P 7 ANN INT C COMP M, P67
[6]  
BURKHARDT S, 2001, CPM 2001
[7]  
CHOI KP, 2003, IN PRESS J COMP SYST
[8]   Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[9]   IDENTIFICATION OF PROTEIN CODING REGIONS BY DATABASE SIMILARITY SEARCH [J].
GISH, W ;
STATES, DJ .
NATURE GENETICS, 1993, 3 (03) :266-272
[10]   Long human-mouse sequence alignments reveal novel regulatory elements: A reason to sequence the mouse genome [J].
Hardison, RC ;
Oeltjen, J ;
Miller, W .
GENOME RESEARCH, 1997, 7 (10) :959-966