JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions

被引:47
作者
Allen, Jonathan E.
Majoros, William H.
Pertea, Mihaela
Salzberg, Steven L. [1 ]
机构
[1] Univ Maryland, Ctr Bioinformat & Comp Biol, College Pk, MD 20742 USA
[2] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[3] Duke Univ, Inst Genome Sci & Policy, Durham, NC 27708 USA
关键词
D O I
10.1186/gb-2006-7-s1-s9
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has yet to emerge that can provide near perfect levels of sensitivity and specificity at the level of whole genes. As an incremental step in this direction, it is hoped that controlled gene finding experiments in the ENCODE regions will provide a more accurate view of the relative benefits of different strategies for modeling and predicting gene structures. Results: Here we describe our general-purpose eukaryotic gene finding pipeline and its major components, as well as the methodological adaptations that we found necessary in accommodating human DNA in our pipeline, noting that a similar level of effort may be necessary by ourselves and others with similar pipelines whenever a new class of genomes is presented to the community for analysis. We also describe a number of controlled experiments involving the differential inclusion of various types of evidence and feature states into our models and the resulting impact these variations have had on predictive accuracy. Conclusions: While in the case of the non-comparative gene finders we found that adding model states to represent specific biological features did little to enhance predictive accuracy, for our evidence-based 'combiner' program the incorporation of additional evidence tracks tended to produce significant gains in accuracy for most evidence types, suggesting that improved modeling efforts at the hidden Markov model level are of relatively little value. We relate these findings to our current plans for future research.
引用
收藏
页数:13
相关论文
共 28 条
  • [1] JIGSAW: integration of multiple sources of evidence for gene prediction
    Allen, JE
    Salzberg, SL
    [J]. BIOINFORMATICS, 2005, 21 (18) : 3596 - 3603
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] Isochores and the evolutionary genomics of vertebrates
    Bernardi, G
    [J]. GENE, 2000, 241 (01) : 3 - 17
  • [4] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [5] The Ensembl automatic gene annotation system
    Curwen, V
    Eyras, E
    Andrews, TD
    Clarke, L
    Mongin, E
    Searle, SMJ
    Clamp, M
    [J]. GENOME RESEARCH, 2004, 14 (05) : 942 - 950
  • [6] PREDICTION OF GENE STRUCTURE
    GUIGO, R
    KNUDSEN, S
    DRAKE, N
    SMITH, T
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1992, 226 (01) : 141 - 157
  • [7] EGASP:: the human ENCODE genome annotation assessment project
    Guigo, Roderic
    Flicek, Paul
    Abril, Josep F.
    Reymond, Alexandre
    Lagarde, Julien
    Denoeud, France
    Antonarakis, Stylianos
    Ashburner, Michael
    Bajic, Vladimir B.
    Birney, Ewan
    Castelo, Robert
    Eyras, Eduardo
    Ucla, Catherine
    Gingeras, Thomas R.
    Harrow, Jennifer
    Hubbard, Tim
    Lewis, Suzanna E.
    Reese, Martin G.
    [J]. GENOME BIOLOGY, 2006, 7 (Suppl 1)
  • [8] Jaakkola TS, 1999, ADV NEUR IN, V11, P487
  • [9] The UCSC Genome Browser Database
    Karolchik, D
    Baertsch, R
    Diekhans, M
    Furey, TS
    Hinrichs, A
    Lu, YT
    Roskin, KM
    Schwartz, M
    Sugnet, CW
    Thomas, DJ
    Weber, RJ
    Haussler, D
    Kent, WJ
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 51 - 54
  • [10] Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]