PatMatch: a program for finding patterns in peptide and nucleotide sequences

被引:121
作者
Yan, T
Yoo, D
Berardini, TZ
Mueller, LA
Weems, DC
Weng, S
Cherry, JM
Rhee, SY
机构
[1] Carnegie Inst Sci, Dept Plant Biol, Stanford, CA 94305 USA
[2] Santa Clara Univ, Dept Comp Engn, Santa Clara, CA 95053 USA
[3] Natl Ctr Genome Resources, Santa Fe, NM 87505 USA
[4] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/nar/gki368
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_for_ matches [D'Souza et al. ( 1997), Trends in Genetics, 13, 497-498], with nondeterministic-reverse grep ( NR- grep), a general pattern matching tool that allows for approximate string matching [Navarro ( 2001), Software Practice and Experience, 31, 1265-1312]. We have tailored NR- grep to be used for DNA and protein searches with PatMatch. The stand-alone version of the software can be adapted for use with any sequence dataset and is available for download at The Arabidopsis Information Resource (TAIR) at ftp://ftp.arabidopsis.org/home/ tair/ Software/ Patmatch/. The PatMatch server is available on the web at http://www.arabidopsis.org/cgi-bin/patmatch/ nph-patmatch.pl for searching Arabidopsis thaliana sequences.
引用
收藏
页码:W262 / W266
页数:5
相关论文
共 12 条
[1]   Genetic and physical maps of Saccharomyces cerevisiae [J].
Cherry, JM ;
Ball, C ;
Weng, S ;
Juvik, G ;
Schmidt, R ;
Adler, C ;
Dunn, B ;
Dwight, S ;
Riles, L ;
Mortimer, RK ;
Botstein, D .
NATURE, 1997, 387 (6632) :67-73
[2]  
DSOUZA M, 1997, TRENDS GENET, V13, P597
[3]   PatSearch:: a program for the detection of patterns and structural motifs in nucleotide sequences [J].
Grillo, G ;
Licciulli, F ;
Liuni, S ;
Sbisà, E ;
Pesole, G .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3608-3612
[4]   The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant [J].
Huala, E ;
Dickerman, AW ;
Garcia-Hernandez, M ;
Weems, D ;
Reiser, L ;
LaFond, F ;
Hanley, D ;
Kiphart, D ;
Zhuang, MZ ;
Huang, W ;
Mueller, LA ;
Bhattacharyya, D ;
Bhaya, D ;
Sobral, BW ;
Beavis, W ;
Meinke, DW ;
Town, CD ;
Somerville, C ;
Rhee, SY .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :102-105
[5]   The EMOTIF database [J].
Huang, JY ;
Brutlag, DL .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :202-204
[6]   tacg -: a grep for DNA -: art. no. 8 [J].
Mangalam, HJ .
BMC BIOINFORMATICS, 2002, 3 (1)
[7]   NR-grep: a fast and flexible pattern-matching tool [J].
Navarro, G .
SOFTWARE-PRACTICE & EXPERIENCE, 2001, 31 (13) :1265-1312
[8]  
Navarro G, 1998, LECT NOTES COMPUT SC, V1448, P14, DOI 10.1007/BFb0030778
[9]   PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance [J].
Pesole, G ;
Liuni, S ;
D'Souza, M .
BIOINFORMATICS, 2000, 16 (05) :439-450
[10]   The Arabidopsis Information Resource (TAIR):: a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community [J].
Rhee, SY ;
Beavis, W ;
Berardini, TZ ;
Chen, GH ;
Dixon, D ;
Doyle, A ;
Garcia-Hernandez, M ;
Huala, E ;
Lander, G ;
Montoya, M ;
Miller, N ;
Mueller, LA ;
Mundodi, S ;
Reiser, L ;
Tacklind, J ;
Weems, DC ;
Wu, YH ;
Xu, I ;
Yoo, D ;
Yoon, J ;
Zhang, PF .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :224-228