GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences

被引:11
作者
Antonov, Ivan [1 ]
Baranov, Pavel [2 ]
Borodovsky, Mark [1 ,3 ,4 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[2] Natl Univ Ireland Univ Coll Cork, Dept Biochem, Cork, Ireland
[3] Moscow Inst Phys & Technol, Dept Mol & Biol Phys, Dolgoprudnyi, Moscow Region, Russia
[4] Georgia Inst Technol, Dept Biomed Engn, Atlanta, GA 30332 USA
基金
英国惠康基金;
关键词
CODING SEQUENCES; DECAY; IDENTIFICATION; EXPRESSION; PHASE;
D O I
10.1093/nar/gks1062
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).
引用
收藏
页码:D152 / D156
页数:5
相关论文
共 30 条
[1]  
Antonov Ivan, 2010, Journal of Bioinformatics and Computational Biology, V8, P535, DOI 10.1142/S0219720010004847
[2]   MEME SUITE: tools for motif discovery and searching [J].
Bailey, Timothy L. ;
Boden, Mikael ;
Buske, Fabian A. ;
Frith, Martin ;
Grant, Charles E. ;
Clementi, Luca ;
Ren, Jingyuan ;
Li, Wilfred W. ;
Noble, William S. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W202-W208
[3]   Recoding in bacteriophages and bacterial IS elements [J].
Baranov, PV ;
Fayet, O ;
Hendrix, RW ;
Atkins, JF .
TRENDS IN GENETICS, 2006, 22 (03) :174-181
[4]   Recoding: translational bifurcations in gene expression [J].
Baranov, PV ;
Gesteland, RF ;
Atkins, JF .
GENE, 2002, 286 (02) :187-201
[5]   Recode-2: new design, new search tools, and many more genes [J].
Bekaert, Michael ;
Firth, Andrew E. ;
Zhang, Yan ;
Gladyshev, Vadim N. ;
Atkins, John F. ;
Baranov, Pavel V. .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D69-D74
[6]   GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions [J].
Besemer, J ;
Lomsadze, A ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2001, 29 (12) :2607-2618
[7]   A first look at ARFome: Dual-coding genes in mammalian Genomes [J].
Chung, Wen-Yu ;
Wadhawan, Samir ;
Szklarczyk, Radek ;
Pond, Sergei Kosakovsky ;
Nekrutenko, Anton .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (05) :855-861
[8]   EXPRESSION OF PEPTIDE-CHAIN RELEASE FACTOR-II REQUIRES HIGH-EFFICIENCY FRAMESHIFT [J].
CRAIGEN, WJ ;
CASKEY, CT .
NATURE, 1986, 322 (6076) :273-275
[9]   Interrupted coding sequences in Mycobacterium smegmatis:: authentic mutations or sequencing errors? [J].
Deshayes, Caroline ;
Perrodou, Emmanuel ;
Gallien, Sebastien ;
Euphrasie, Daniel ;
Schaeffer, Christine ;
Van-Dorsselaer, Alain ;
Poch, Olivier ;
Lecompte, Odile ;
Reyrat, Jean-Marc .
GENOME BIOLOGY, 2007, 8 (02)
[10]   Mechanisms and implications of programmed translational frameshifting [J].
Dinman, Jonathan D. .
WILEY INTERDISCIPLINARY REVIEWS-RNA, 2012, 3 (05) :661-673