Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment

被引:50
作者
Bajic, Vladimir B. [1 ]
Brent, Michael R.
Brown, Randall H.
Frankish, Adam
Harrow, Jennifer
Ohler, Uwe
Solovyev, Victor V.
Tan, Sin Lam
机构
[1] Univ Western Cape, S African Natl Bioinformat Inst, ZA-7535 Bellville, South Africa
[2] Washington Univ, Dept Comp Sci, Lab Comp Genom, St Louis, MO USA
[3] Wellcome Trust Sanger Inst, Human & Vertebrate Anal & Annotat Grp, Cambridge CB10 1SA, England
[4] Wellcome Trust Sanger Inst, Hinxton CB10 1HH, Cambs, England
[5] Duke Univ, Inst Genome Sci & Policy, Durham, NC 27708 USA
[6] Univ London, Royal Holloway, London, England
[7] Inst Infocom Res, Knowledge Extract Lab, Inst Infocomm Res, Singapore 119613, Singapore
关键词
D O I
10.1186/gb-2006-7-s1-s3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. Results: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. Conclusions: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high- throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.
引用
收藏
页数:13
相关论文
共 51 条
  • [1] Pairagon plus N-SCAN_EST: a model-based gene annotation pipeline
    Arumugam, Manimozhiyan
    Wei, Chaochun
    Brown, Randall H.
    Brent, Michael R.
    [J]. GENOME BIOLOGY, 2006, 7 (Suppl 1)
  • [2] Bajic V B, 2000, Brief Bioinform, V1, P214, DOI 10.1093/bib/1.3.214
  • [3] Promoter prediction analysis on the whole human genome
    Bajic, VB
    Tan, SL
    Suzuki, Y
    Sugano, S
    [J]. NATURE BIOTECHNOLOGY, 2004, 22 (11) : 1467 - 1473
  • [4] Dragon Gene Start Finder: An advanced system for finding approximate locations of the start of gene transcriptional units
    Bajic, VB
    Seah, SH
    [J]. GENOME RESEARCH, 2003, 13 (08) : 1923 - 1929
  • [5] Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters
    Bajic, VB
    Seah, SH
    Chong, A
    Zhang, GL
    Koh, JLY
    Brusic, V
    [J]. BIOINFORMATICS, 2002, 18 (01) : 198 - 199
  • [6] Dragon Gene Start Finder identifies approximate locations of the 5′ ends of genes
    Bajic, VB
    Seah, SH
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3560 - 3563
  • [7] Computer model for recognition of functional transcription start sites in RNA polymerase II promoters of vertebrates
    Bajic, VB
    Seah, SH
    Chong, A
    Krishnan, SPT
    Koh, JLY
    Brusic, V
    [J]. JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2003, 21 (05) : 323 - 332
  • [8] BAJIC VB, 2005, ENCY GENETICS GEOM 4
  • [9] Begin at the beginning:: Predicting genes with 5′ UTRs
    Brown, RH
    Gross, SS
    Brent, MR
    [J]. GENOME RESEARCH, 2005, 15 (05) : 742 - 747
  • [10] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94