DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES

被引:88
作者
BORODOVSKY, M
MCININCH, JD
KOONIN, EV
RUDD, KE
MEDIGUE, C
DANCHIN, A
机构
[1] NATL LIB MED, NATL CTR BIOTECHNOL INFORMAT, BETHESDA, MD 20894 USA
[2] INST CURIE, F-75231 PARIS 05, FRANCE
[3] INST PASTEUR, F-75724 PARIS 15, FRANCE
关键词
D O I
10.1093/nar/23.17.3554
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We further investigated the statistical features of the three classes of Escherichia coli genes that have been previously delineated by factorial correspondence analysis and dynamic clustering methods, A phased Markov model for a nucleotide sequence of each gene class was developed and employed for gene prediction using the GeneMark program, The protein-coding region prediction accuracy was determined for class-specific Markov models of different orders when the programs implementing these models were applied to gene sequences from the same or other classes. It is shown that at least two training sets and two program versions derived for different classes of E.coli genes are necessary in order to achieve a high accuracy of coding region prediction for uncharacterized sequences, Some annotated E.coli genes from Class I and Class III are shown to be spurious, whereas many open reading frames (ORFs) that have not been annotated in GenBank as genes are predicted to encode proteins, The amino acid sequences of the putative products of these ORFs initially did not show similarity to already known proteins. However, conserved regions have been identified in several of them by screening the latest entries in protein sequence databases and applying methods for motif search, while some other of these new genes have been identified in independent experiments.
引用
收藏
页码:3554 / 3562
页数:9
相关论文
共 31 条
[1]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V222, P851
[3]  
BLATTNER FR, 1993, NUCLEIC ACIDS RES, V21, P5408
[4]  
BORODOVSKII MY, 1986, MOL BIOL+, V20, P1144
[5]  
BORODOVSKII MY, 1986, MOL BIOL+, V20, P833
[6]   INTRINSIC AND EXTRINSIC APPROACHES FOR DETECTING GENES IN A BACTERIAL GENOME [J].
BORODOVSKY, M ;
RUDD, KE ;
KOONIN, EV .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4756-4767
[7]   NEW GENES IN OLD SEQUENCE - A STRATEGY FOR FINDING GENES IN THE BACTERIAL GENOME [J].
BORODOVSKY, M ;
KOONIN, EV ;
RUDD, KE .
TRENDS IN BIOCHEMICAL SCIENCES, 1994, 19 (08) :309-313
[8]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[9]   DNA-SEQUENCE AND ANALYSIS OF 136 KILOBASES OF THE ESCHERICHIA-COLI GENOME - ORGANIZATIONAL SYMMETRY AROUND THE ORIGIN OF REPLICATION [J].
BURLAND, V ;
PLUNKETT, G ;
DANIELS, DL ;
BLATTNER, FR .
GENOMICS, 1993, 16 (03) :551-561
[10]   ANALYSIS OF THE ESCHERICHIA-COLI GENOME - DNA-SEQUENCE OF THE REGION FROM 84.5 TO 86.5 MINUTES [J].
DANIELS, DL ;
PLUNKETT, G ;
BURLAND, V ;
BLATTNER, FR .
SCIENCE, 1992, 257 (5071) :771-778