PILER-CR: Fast and accurate identification of CRISPR repeats

被引:243
作者
Edgar, Robert C.
机构
[1] Tiburon, CA
关键词
D O I
10.1186/1471-2105-8-18
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Sequencing of prokaryotic genomes has recently revealed the presence of CRISPR elements: short, highly conserved repeats separated by unique sequences of similar length. The distinctive sequence signature of CRISPR repeats can be found using general-purpose repeat- or pattern-finding software tools. However, the output of such tools is not always ideal for studying these repeats, and significant effort is sometimes needed to build additional tools and perform manual analysis of the output. Results: We present PILER-CR, a program specifically designed for the identification and analysis of CRISPR repeats. The program executes rapidly, completing a 5 Mb genome in around 5 seconds on a current desktop computer. We validate the algorithm by manual curation and by comparison with published surveys of these repeats, finding that PILER-CR has both high sensitivity and high specificity. We also present a catalogue of putative CRISPR repeats identified in a comprehensive analysis of 346 prokaryotic genomes. Conclusion: PILER-CR is a useful tool for rapid identification and classification of CRISPR repeats. The software is donated to the public domain.
引用
收藏
页数:6
相关论文
共 15 条
[1]   Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii [J].
Bult, CJ ;
White, O ;
Olsen, GJ ;
Zhou, LX ;
Fleischmann, RD ;
Sutton, GG ;
Blake, JA ;
FitzGerald, LM ;
Clayton, RA ;
Gocayne, JD ;
Kerlavage, AR ;
Dougherty, BA ;
Tomb, JF ;
Adams, MD ;
Reich, CI ;
Overbeek, R ;
Kirkness, EF ;
Weinstock, KG ;
Merrick, JM ;
Glodek, A ;
Scott, JL ;
Geoghagen, NSM ;
Weidman, JF ;
Fuhrmann, JL ;
Nguyen, D ;
Utterback, TR ;
Kelley, JM ;
Peterson, JD ;
Sadow, PW ;
Hanna, MC ;
Cotton, MD ;
Roberts, KM ;
Hurst, MA ;
Kaine, BP ;
Borodovsky, M ;
Klenk, HP ;
Fraser, CM ;
Smith, HO ;
Woese, CR ;
Venter, JC .
SCIENCE, 1996, 273 (5278) :1058-1073
[2]   Chromosome evolution in the Thermotogales:: Large-scale inversions and strain diversification of CRISPR sequences [J].
DeBoy, RT ;
Mongodin, EF ;
Emerson, JB ;
Nelson, KE .
JOURNAL OF BACTERIOLOGY, 2006, 188 (07) :2364-2374
[3]   Searching for patterns in genomic data [J].
Dsouza, M ;
Larsen, N ;
Overbeek, R .
TRENDS IN GENETICS, 1997, 13 (12) :497-498
[4]   MUSCLE: a multiple sequence alignment method with reduced time and space complexity [J].
Edgar, RC .
BMC BIOINFORMATICS, 2004, 5 (1) :1-19
[5]   PILER: identification and classification of genomic repeats [J].
Edgar, RC ;
Myers, EW .
BIOINFORMATICS, 2005, 21 :I152-I158
[6]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[7]  
GODDE JS, 2006, J MOL EVOL
[8]   A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes [J].
Haft, DH ;
Selengut, J ;
Mongodin, EF ;
Nelson, KE .
PLOS COMPUTATIONAL BIOLOGY, 2005, 1 (06) :474-483
[9]   Identification of genes that are associated with DNA repeats in prokaryotes [J].
Jansen, R ;
van Embden, JDA ;
Gaastra, W ;
Schouls, LM .
MOLECULAR MICROBIOLOGY, 2002, 43 (06) :1565-1575
[10]  
Jansen Rund, 2002, OMICS A Journal of Integrative Biology, V6, P23, DOI 10.1089/15362310252780816