Using database matches with HMMGene for automated gene detection in Drosophila

被引:52
作者
Krogh, A [1 ]
机构
[1] Tech Univ Denmark, Ctr Biol Sequence Anal, DK-2800 Lyngby, Denmark
关键词
D O I
10.1101/gr.10.4.523
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The application of the gene finder HMMGene to the Adh region of the Drosophila melanogaster is described, and the prediction results are analyzed. HMMGene is based on a probabilistic model called a hidden Markov model, and the probabilistic framework Facilitates the inclusion of database matches of varying degrees of certainty. It is shown that database matches clearly improve the performance of the gene finder. For instance, the sensitivity for coding exons predicted with both ends correct grows From 62% to 70% on a high-quality test set, when matches to proteins, cDNAs, repeats, and transposons are included. The specificity drops more than the sensitivity increases when ESTs are used. This is due to the high noise level in EST matches, and it is discussed in more detail why this is and how it might be improved.
引用
收藏
页码:523 / 528
页数:6
相关论文
共 13 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Ashburner M, 1999, GENETICS, V153, P179
  • [3] The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 38 - 42
  • [4] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [5] Durbin R., 1998, BIOL SEQUENCE ANAL
  • [6] Finding genes in DNA with a Hidden Markov Model
    Henderson, J
    Salzberg, S
    Fasman, KH
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (02) : 127 - 141
  • [7] Krogh A, 1997, ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, P179
  • [8] Krogh A, 1998, N COMP BIOC, V32, P45
  • [9] A HIDDEN MARKOV MODEL THAT FINDS GENES IN ESCHERICHIA-COLI DNA
    KROGH, A
    MIAN, IS
    HAUSSLER, D
    [J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (22) : 4768 - 4778
  • [10] Krogh Anders, 1998, P261, DOI 10.1016/B978-012102051-4/50012-X