A probabilistic, method for identifying start codons in bacterial genomes

被引:145
作者
Suzek, BE
Ermolaeva, MD
Schreiber, M
Salzberg, SL [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Inst Genom Res, Rockville, MD 20850 USA
[3] Univ Otago, Dept Biochem, Dunedin, New Zealand
关键词
D O I
10.1093/bioinformatics/17.12.1123
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
As the pace of genome sequencing has accelerated, the need for highly accurate gene prediction systems has grown. Computational systems for identifying genes in prokaryotic genomes have sensitivities of 98-99% or higher (Delcher et al., Nucleic Acids Res., 27, 4636-4641, 1999). These accuracy figures are calculated by comparing the locations of verified stop codons to the predictions. Determining the accuracy of start codon prediction is more problematic, however, due to the relatively small number of start sites that have been confirmed by independent, non-computational methods. Nonetheless, the accuracy of gene finders at predicting the exact gene boundaries at both the 5' and 3' ends of genes is of critical importance for microbial genome annotation, especially in light of the important signaling information that is sometimes found on the 5' end of a protein coding region. In this paper we propose a probabilistic method to improve the accuracy of gene identification systems at finding precise translation start sites. The new system, RBSfinder, is tested on a validated set of genes from Escherichia coli, for which it improves the accuracy of start site locations predicted by computational gene finding systems from the range 67-77% to 90% correct.
引用
收藏
页码:1123 / 1130
页数:8
相关论文
共 14 条
[1]  
[Anonymous], GENES
[2]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[3]  
Claverie JM, 1996, COMPUT APPL BIOSCI, V12, P431
[4]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[5]   WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512
[6]   Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12 [J].
Link, AJ ;
Robison, K ;
Church, GM .
ELECTROPHORESIS, 1997, 18 (08) :1259-1313
[7]   GeneMark.hmm: new solutions for gene finding [J].
Lukashin, AV ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 1998, 26 (04) :1107-1115
[8]  
MIKONNEN M, 1994, FEMS MICROBIOL LETT, V116, P315
[9]   Analysis of base-pairing potentials between 16S rRNA and 5′ UTR for translation initiation in various prokaryotes [J].
Osada, Y ;
Saito, R ;
Tomita, M .
BIOINFORMATICS, 1999, 15 (7-8) :578-581
[10]   EcoGene:: a genome sequence database for Escherichia coli K-12 [J].
Rudd, KE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :60-64