Finding genes in DNA with a Hidden Markov Model

被引:127
作者
Henderson, J
Salzberg, S
Fasman, KH
机构
[1] JOHNS HOPKINS UNIV,DIV BIOMED INFORMAT SCI,BALTIMORE,MD 21218
[2] MIT,CTR GENOME RES,CAMBRIDGE,MA 02139
关键词
D O I
10.1089/cmb.1997.4.127
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This study describes a new Hidden Markov Model (HMM) system for segmenting uncharacterized genomic DNA sequences into exons, introns, and intergenic regions. Separate HMM modules were designed and trained for specific regions of DNA: exons, introns, intergenic regions, and splice sites, The models were then tied together to form a biologically feasible topology, The integrated HMM was trained further on a set of eukaryotic DNA sequences and tested by using it to segment a separate set of sequences, The resulting HMM system which is called VEIL (Viterbi Exon-Intron Locator), obtains an overall accuracy on test data of 92% of total bases correctly labelled, with a correlation coefficient of 0.73, Using the more stringent test of exact exon prediction, VEIL correctly located both ends of 53% of the coding exons, and 49% of the exons it predicts are exactly correct, These results compare favorably to the best previous results for gene structure prediction and demonstrate the benefits of using HMMs for this problem.
引用
收藏
页码:127 / 141
页数:15
相关论文
共 29 条
  • [1] [Anonymous], 1989, Automatic speech recognition: The development of the SPHINX system
  • [2] BAHL L, 1983, IEEE T ACOUSTICS SPE, P308
  • [3] HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION
    BALDI, P
    CHAUVIN, Y
    HUNKAPILLER, T
    MCCLURE, MA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) : 1059 - 1063
  • [4] BALDI P, 1994, ADV NEURAL INFORMATI, V6, P761
  • [5] BROWN M, 1993, 1ST P INT C INT SYST, P47
  • [6] Evaluation of gene structure prediction programs
    Burset, M
    Guigo, R
    [J]. GENOMICS, 1996, 34 (03) : 353 - 367
  • [7] HIDDEN MARKOV-CHAINS AND THE ANALYSIS OF GENOME STRUCTURE
    CHURCHILL, GA
    [J]. COMPUTERS & CHEMISTRY, 1992, 16 (02): : 107 - 115
  • [8] DELCHER AL, 1993, PROCEEDINGS OF THE ELEVENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, P316
  • [9] ASSESSMENT OF PROTEIN CODING MEASURES
    FICKETT, JW
    TUNG, CS
    [J]. NUCLEIC ACIDS RESEARCH, 1992, 20 (24) : 6441 - 6450
  • [10] GOAN X, 1996, COMPUTER APPL BIOSCI, V12, P31