Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models

被引:87
作者
Yada, T
Nakao, M
Totoki, Y
Nakai, K
机构
[1] Univ Tokyo, Inst Med Sci,Lab Genome Database, Ctr Human Genome,RIKEN,Genom Sci Ctr, Minato Ku, Tokyo 1088639, Japan
[2] Kyoto Univ, Chem Res Inst, Kyoto 6110011, Japan
基金
日本学术振兴会;
关键词
D O I
10.1093/bioinformatics/15.12.987
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The hidden Markov model (HMM) is a valuable technique for gene-finding, especially because its flexibility enables the inclusion of various sequence features. Recent programs for bacterial gene-finding include the information of ribosomal binding site (RBS) to improve the recognition accuracy of the start codon, using this feature. We report here our attempt to extend the model into the total transcriptional unit, enabling the prediction of operon structures. Results: First, we improved the prediction accuracy of coding sequences (CDSs) by employing the models of 'typical', 'atypical' and 'negative (false-positive)' classes as well as the models of RBS and its downstream spacer: The sensitivity of exactly predicting the 204 experimentally confirmed CDSs reached 90.2% in an objective test. Based on the prediction result of CDSs, the positions of the promoters and terminators were predicted. Our model could exactly recognize 60% of 390 known transcriptional units. Thus, the accuracy and significance of this prediction problem is far from trivial. We would like to propose this problem as an open theme in bioinformatics because the ongoing or planned post-sequencing projects will produce much data for future improvements.
引用
收藏
页码:987 / 993
页数:7
相关论文
共 21 条
[1]  
ALIFANO P, 1991, CELL, V64, P553
[2]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[3]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[4]   PROMOTER RECOGNITION AND PROMOTER STRENGTH IN THE ESCHERICHIA-COLI SYSTEM [J].
BRUNNER, M ;
BUJARD, H .
EMBO JOURNAL, 1987, 6 (10) :3139-3144
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]   PREDICTION OF RHO-INDEPENDENT ESCHERICHIA-COLI TRANSCRIPTION TERMINATORS - A STATISTICAL-ANALYSIS OF THEIR RNA STEM-LOOP STRUCTURES [J].
CARAFA, YD ;
BRODY, E ;
THERMES, C .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 216 (04) :835-858
[7]   Bacterial transcript imaging by hybridization of total RNA to oligonucleotide arrays [J].
de Saizieu, A ;
Certa, U ;
Warrington, J ;
Gray, C ;
Keck, W ;
Mous, J .
NATURE BIOTECHNOLOGY, 1998, 16 (01) :45-48
[8]   Combining diverse evidence for gene recognition in completely sequenced bacterial genomes [J].
Frishman, D ;
Mironov, A ;
Mewes, HW ;
Gelfand, M .
NUCLEIC ACIDS RESEARCH, 1998, 26 (12) :2941-2947
[9]   How to interpret an anonymous bacterial genome: Machine learning approach to gene identification [J].
Hayes, WS ;
Borodovsky, M .
GENOME RESEARCH, 1998, 8 (11) :1154-1171
[10]  
HIROSAWA M, 1995, COMPUT APPL BIOSCI, V11, P13