A comparative genomic method for computational identification of prokaryotic translation initiation sites

被引:11
作者
Walker, M
Pavlovic, V
Kasif, S
机构
[1] Boston Univ, Bioinformat Program, Boston, MA 02215 USA
[2] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/nar/gkf423
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The ever growing number of completely sequenced prokaryotic genomes facilitates cross-species comparisons by genomic annotation algorithms. This paper introduces a new probabilistic framework for comparative genomic analysis and demonstrates its utility in the context of improving the accuracy of prokaryotic gene start site detection. Our framework employs a product hidden Markov model (PROD-HMM) with state architecture to model the species-specific trinucleotide frequency patterns in sequences immediately upstream and downstream of a translation start site and to detect the contrasting non-synonymous (amino acid changing) and synonymous (silent) substitution rates that differentiate prokaryotic coding from intergenic regions. Depending on the intricacy of the features modeled by the hidden state architecture, intergenic, regulatory, promoter and coding regions can be delimited by this method. The new system is evaluated using a preliminary set of orthologous Pyrococcus gene pairs, for which it demonstrates an improved accuracy of detection. Its robustness is confirmed by analysis with cross-validation of an experimentally verified set of Escherichia coli K-12 and Salmonella thyphimurium LT2 orthologs. The novel architecture has a number of attractive features that distinguish it from previous comparative models such as pair-HMMs.
引用
收藏
页码:3181 / 3191
页数:11
相关论文
共 22 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Human and mouse gene structure: Comparative analysis and application to exon prediction [J].
Batzoglou, S ;
Pachter, L ;
Mesirov, JP ;
Berger, B ;
Lander, ES .
GENOME RESEARCH, 2000, 10 (07) :950-958
[3]   GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions [J].
Besemer, J ;
Lomsadze, A ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2001, 29 (12) :2607-2618
[4]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[5]   Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[6]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[7]  
GOLDMAN N, 1994, MOL BIOL EVOL, V11, P725
[8]   Bacterial start site prediction [J].
Hannenhalli, SS ;
Hayes, WS ;
Hatzigeorgiou, AG ;
Fickett, JW .
NUCLEIC ACIDS RESEARCH, 1999, 27 (17) :3577-3582
[9]  
Kawarabayasi Y, 1998, DNA Res, V5, P55, DOI 10.1093/dnares/5.2.55
[10]   Genome evolution at the genus level: Comparison of three complete genomes of hyperthermophilic Archaea [J].
Lecompte, O ;
Ripp, R ;
Puzos-Barbe, V ;
Duprat, S ;
Heilig, R ;
Dietrich, J ;
Thierry, JC ;
Poch, O .
GENOME RESEARCH, 2001, 11 (06) :981-993