JIGSAW: integration of multiple sources of evidence for gene prediction

被引:87
作者
Allen, JE [1 ]
Salzberg, SL
机构
[1] Univ Maryland, Inst Adv Comp Studies, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Adv Comp Studies, Dept Comp Sci, College Pk, MD 20742 USA
[3] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
关键词
D O I
10.1093/bioinformatics/bti609
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions perfectly. Ab initio programs can be augmented by evidence such as expression data or protein sequence homology, which improves their performance. The amount of such evidence continues to grow, but computational methods continue to have difficulty predicting genes when the evidence is conflicting or incomplete. Genome annotation pipelines collect a variety of types of evidence about gene structure and synthesize the results, which can then be refined further through manual, expert curation of gene models. Results: JIGSAW is a new gene finding system designed to automate the process of predicting gene structure from multiple sources of evidence, with results that often match the performance of human curators. JIGSAW computes the relative weight of different lines of evidence using statistics generated from a training set, and then combines the evidence using dynamic programming. Our results show that JIGSAW's performance is superior to ab initio gene finding methods and to other pipelines such as Ensembl. Even without evidence from alignment to known genes, JIGSAW can substantially improve gene prediction accuracy as compared with existing methods.
引用
收藏
页码:3596 / 3603
页数:8
相关论文
共 23 条
  • [21] Siepel A., 2003, P 7 ANN INT C COMPUT, P277
  • [22] The sequence of the human genome
    Venter, JC
    Adams, MD
    Myers, EW
    Li, PW
    Mural, RJ
    Sutton, GG
    Smith, HO
    Yandell, M
    Evans, CA
    Holt, RA
    Gocayne, JD
    Amanatides, P
    Ballew, RM
    Huson, DH
    Wortman, JR
    Zhang, Q
    Kodira, CD
    Zheng, XQH
    Chen, L
    Skupski, M
    Subramanian, G
    Thomas, PD
    Zhang, JH
    Miklos, GLG
    Nelson, C
    Broder, S
    Clark, AG
    Nadeau, C
    McKusick, VA
    Zinder, N
    Levine, AJ
    Roberts, RJ
    Simon, M
    Slayman, C
    Hunkapiller, M
    Bolanos, R
    Delcher, A
    Dew, I
    Fasulo, D
    Flanigan, M
    Florea, L
    Halpern, A
    Hannenhalli, S
    Kravitz, S
    Levy, S
    Mobarry, C
    Reinert, K
    Remington, K
    Abu-Threideh, J
    Beasley, E
    [J]. SCIENCE, 2001, 291 (5507) : 1304 - +
  • [23] Database resources of the National Center for Biotechnology
    Wheeler, DL
    Church, DM
    Federhen, S
    Lash, AE
    Madden, TL
    Pontius, JU
    Schuler, GD
    Schriml, LM
    Sequeira, E
    Tatusova, TA
    Wagner, L
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 28 - 33