Global discriminative learning for higher-accuracy computational gene prediction

被引:51
作者
Bernal, Axel
Crammer, Koby
Hatzigeorgiou, Artemis
Pereira, Fernando
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] Univ Penn, Dept Genet, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
D O I
10.1371/journal.pcbi.0030054
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.
引用
收藏
页码:488 / 497
页数:10
相关论文
共 33 条
  • [1] [Anonymous], 2006, GENOME BIOL S1
  • [2] GeneWise and genomewise
    Birney, E
    Clamp, M
    Durbin, R
    [J]. GENOME RESEARCH, 2004, 14 (05) : 988 - 995
  • [3] Finding the genes in genomic DNA
    Burge, CB
    Karlin, S
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) : 346 - 354
  • [4] Evaluation of gene structure prediction programs
    Burset, M
    Guigo, R
    [J]. GENOMICS, 1996, 34 (03) : 353 - 367
  • [5] Collins M, 2002, PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P1
  • [6] Crammer K, 2006, J MACH LEARN RES, V7, P551
  • [7] CRAMMER K, 2004, THESIS HEBREW U JERU
  • [8] Influence of exon duplication on intron and exon phase distribution
    Fedorov, A
    Fedorova, L
    Starshenko, V
    Filatov, V
    Grigor'ev, E
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1998, 46 (03) : 263 - 271
  • [9] The ENCODE (ENCyclopedia of DNA elements) Project
    Feingold, EA
    Good, PJ
    Guyer, MS
    Kamholz, S
    Liefer, L
    Wetterstrand, K
    Collins, FS
    Gingeras, TR
    Kampa, D
    Sekinger, EA
    Cheng, J
    Hirsch, H
    Ghosh, S
    Zhu, Z
    Pate, S
    Piccolboni, A
    Yang, A
    Tammana, H
    Bekiranov, S
    Kapranov, P
    Harrison, R
    Church, G
    Struhl, K
    Ren, B
    Kim, TH
    Barrera, LO
    Qu, C
    Van Calcar, S
    Luna, R
    Glass, CK
    Rosenfeld, MG
    Guigo, R
    Antonarakis, SE
    Birney, E
    Brent, M
    Pachter, L
    Reymond, A
    Dermitzakis, ET
    Dewey, C
    Keefe, D
    Denoeud, F
    Lagarde, J
    Ashurst, J
    Hubbard, T
    Wesselink, JJ
    Castelo, R
    Eyras, E
    Myers, RM
    Sidow, A
    Batzoglou, S
    [J]. SCIENCE, 2004, 306 (5696) : 636 - 640
  • [10] Leveraging the mouse genome for gene prediction in human: From whole-genome shotgun reads to a global synteny map
    Flicek, P
    Keibler, E
    Hu, P
    Korf, I
    Brent, MR
    [J]. GENOME RESEARCH, 2003, 13 (01) : 46 - 54