JIGSAW: integration of multiple sources of evidence for gene prediction

被引:87
作者
Allen, JE [1 ]
Salzberg, SL
机构
[1] Univ Maryland, Inst Adv Comp Studies, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Adv Comp Studies, Dept Comp Sci, College Pk, MD 20742 USA
[3] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
关键词
D O I
10.1093/bioinformatics/bti609
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions perfectly. Ab initio programs can be augmented by evidence such as expression data or protein sequence homology, which improves their performance. The amount of such evidence continues to grow, but computational methods continue to have difficulty predicting genes when the evidence is conflicting or incomplete. Genome annotation pipelines collect a variety of types of evidence about gene structure and synthesize the results, which can then be refined further through manual, expert curation of gene models. Results: JIGSAW is a new gene finding system designed to automate the process of predicting gene structure from multiple sources of evidence, with results that often match the performance of human curators. JIGSAW computes the relative weight of different lines of evidence using statistics generated from a training set, and then combines the evidence using dynamic programming. Our results show that JIGSAW's performance is superior to ab initio gene finding methods and to other pipelines such as Ensembl. Even without evidence from alignment to known genes, JIGSAW can substantially improve gene prediction accuracy as compared with existing methods.
引用
收藏
页码:3596 / 3603
页数:8
相关论文
共 23 条
  • [1] Allen JE, 2004, GENOME RES, V14, P142, DOI 10.1101/gr.1562804
  • [2] The Vertebrate Genome Annotation (Vega) database
    Ashurst, JL
    Chen, CK
    Gilbert, JGR
    Jekosch, K
    Keenan, S
    Meidl, P
    Searle, SM
    Stalker, J
    Storey, R
    Trevanion, S
    Wilming, L
    Hubbard, T
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D459 - D465
  • [3] The universal protein resource (UniProt)
    Bairoch, A
    Apweiler, R
    Wu, CH
    Barker, WC
    Boeckmann, B
    Ferro, S
    Gasteiger, E
    Huang, HZ
    Lopez, R
    Magrane, M
    Martin, MJ
    Natale, DA
    O'Donovan, C
    Redaschi, N
    Yeh, LSL
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D154 - D159
  • [4] BUELL CR, 2005, IN PRESS GENOME RES
  • [5] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [6] The Ensembl automatic gene annotation system
    Curwen, V
    Eyras, E
    Andrews, TD
    Clarke, L
    Mongin, E
    Searle, SMJ
    Clamp, M
    [J]. GENOME RESEARCH, 2004, 14 (05) : 942 - 950
  • [7] *EGASP, 2005, GEN PRED WORKSH
  • [8] The ENCODE (ENCyclopedia of DNA elements) Project
    Feingold, EA
    Good, PJ
    Guyer, MS
    Kamholz, S
    Liefer, L
    Wetterstrand, K
    Collins, FS
    Gingeras, TR
    Kampa, D
    Sekinger, EA
    Cheng, J
    Hirsch, H
    Ghosh, S
    Zhu, Z
    Pate, S
    Piccolboni, A
    Yang, A
    Tammana, H
    Bekiranov, S
    Kapranov, P
    Harrison, R
    Church, G
    Struhl, K
    Ren, B
    Kim, TH
    Barrera, LO
    Qu, C
    Van Calcar, S
    Luna, R
    Glass, CK
    Rosenfeld, MG
    Guigo, R
    Antonarakis, SE
    Birney, E
    Brent, M
    Pachter, L
    Reymond, A
    Dermitzakis, ET
    Dewey, C
    Keefe, D
    Denoeud, F
    Lagarde, J
    Ashurst, J
    Hubbard, T
    Wesselink, JJ
    Castelo, R
    Eyras, E
    Myers, RM
    Sidow, A
    Batzoglou, S
    [J]. SCIENCE, 2004, 306 (5696) : 636 - 640
  • [9] Leveraging the mouse genome for gene prediction in human: From whole-genome shotgun reads to a global synteny map
    Flicek, P
    Keibler, E
    Hu, P
    Korf, I
    Brent, MR
    [J]. GENOME RESEARCH, 2003, 13 (01) : 46 - 54
  • [10] Assembling genes from predicted exons in linear time with dynamic programming
    Guigó, R
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (04) : 681 - 702