A 'PolyORFomic' analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs

被引:15
作者
Harrison, PM
Carriero, N
Liu, Y
Gerstein, M
机构
[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[2] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
关键词
gene annotation; bioinformatics; pseudogenes; hypothetical ORFs;
D O I
10.1016/j.jmb.2003.09.016
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Prokaryote gene annotation is complicated by large numbers of short open reading frames (ORFs) that arise naturally from genetic code design. Historically, many hypothetical ORFs have been annotated as genes in microbes, usually with an arbitrary length threshold (e.g. greater than 100 codons). Given the use of such thresholds, what is the extent of genuine undiscovered short genes in the current sampling of prokaryote genomes? To assess rigorously the potential under-annotation of short ORFs with homology, we exhaustively compared the polyORFome-all possible ORFs in 64 prokaryotes (53 bacteria and 11 archaea) plus budding yeast-to itself and to all known proteins. The novelty of our analysis is that, firstly, sequence comparisons to/between both annotated and un-annotated ORFs are considered, and secondly a two-step disabled-homology filter is applied to set aside putative pseudogenes and spurious ORFs. We find that un-annotated homologous short ORFs (uhORFs) correspond to a small but non-negligible fraction of the annotated prokaryote proteomes (0.5-3.8%, depending on selection criteria). Moreover, the disabled-homology filter indicates that about a third of uhORFs correspond to putative pseudogenes or spurious ORFs. Our analysis shows that the use of annotation length thresholds is unnecessary, as there are manageable numbers of short ORF homologies conserved (without disablements) across microbial genomes. Data on uhORFs are available from http://pseudogene.org/polyo (C) 2003 Elsevier Ltd. All rights reserved.
引用
收藏
页码:885 / 892
页数:8
相关论文
共 26 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[3]   Genomes OnLine Database (GOLD): a monitor of genome projects world-wide [J].
Bernal, A ;
Ear, U ;
Kyrpides, N .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :126-127
[4]   Genomic Exploration of the Hemiascomycetous Yeasts:: 4.: The genome of Saccharomyces cerevisiae revisited [J].
Blandin, G ;
Durrens, P ;
Tekaia, F ;
Aigle, M ;
Bolotin-Fukuhara, M ;
Bon, E ;
Casarégola, S ;
de Montigny, J ;
Gaillardin, C ;
Lépingle, A ;
Llorente, B ;
Malpertuy, A ;
Neuvéglise, C ;
Ozier-Kalogeropoulos, O ;
Perrin, A ;
Potier, S ;
Souciet, JL ;
Talla, E ;
Toffano-Nioche, C ;
Wésolowski-Louvel, M ;
Marck, C ;
Dujon, B .
FEBS LETTERS, 2000, 487 (01) :31-36
[5]   The complete genome sequence of the lactic acid bacterium Lactococcus lactis ssp lactis IL1403 [J].
Bolotin, A ;
Wincker, P ;
Mauger, S ;
Jaillon, O ;
Malarme, K ;
Weissenbach, J ;
Ehrlich, SD ;
Sorokin, A .
GENOME RESEARCH, 2001, 11 (05) :731-753
[6]   Evolutionary rate heterogeneity in proteins with long disordered regions [J].
Brown, CJ ;
Takayama, S ;
Campen, AM ;
Vise, P ;
Marshall, TW ;
Oldfield, CJ ;
Williams, CJ ;
Dunker, AK .
JOURNAL OF MOLECULAR EVOLUTION, 2002, 55 (01) :104-110
[7]   Finding functional features in Saccharomyces genomes by phylogenetic footprinting [J].
Cliften, P ;
Sudarsanam, P ;
Desikan, A ;
Fulton, L ;
Fulton, B ;
Majors, J ;
Waterston, R ;
Cohen, BA ;
Johnston, M .
SCIENCE, 2003, 301 (5629) :71-76
[8]   Biology's new Rosetta stone [J].
Das, S ;
Yu, LH ;
Gaitatzes, C ;
Rogers, R ;
Freeman, J ;
Bienkowska, J ;
Adams, RM ;
Smith, TF ;
Lindellen, J .
NATURE, 1997, 385 (6611) :29-30
[9]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[10]   A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution [J].
Harrison, P ;
Kumar, A ;
Lan, N ;
Echols, N ;
Snyder, M ;
Gerstein, M .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 316 (03) :409-419