A ChIP-Seq Benchmark Shows That Sequence Conservation Mainly Improves Detection of Strong Transcription Factor Binding Sites

被引:12
作者
Handstad, Tony [1 ]
Rye, Morten Beck [1 ]
Drablos, Finn [1 ]
Saetrom, Pal [1 ,2 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Canc Res & Mol Med, N-7034 Trondheim, Norway
[2] Norwegian Univ Sci & Technol, Dept Comp & Informat Sci, N-7034 Trondheim, Norway
来源
PLOS ONE | 2011年 / 6卷 / 04期
关键词
HUMAN GENOME; REGULATORY ELEMENTS; IDENTIFICATION; PREDICTION; MODELS;
D O I
10.1371/journal.pone.0018430
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS) is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial. Results: Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods. Conclusions: Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites.
引用
收藏
页数:9
相关论文
共 26 条
  • [1] Diversity and Complexity in DNA Recognition by Transcription Factors
    Badis, Gwenael
    Berger, Michael F.
    Philippakis, Anthony A.
    Talukder, Shaheynoor
    Gehrke, Andrew R.
    Jaeger, Savina A.
    Chan, Esther T.
    Metzler, Genita
    Vedenko, Anastasia
    Chen, Xiaoyu
    Kuznetsov, Hanna
    Wang, Chi-Fong
    Coburn, David
    Newburger, Daniel E.
    Morris, Quaid
    Hughes, Timothy R.
    Bulyk, Martha L.
    [J]. SCIENCE, 2009, 324 (5935) : 1720 - 1723
  • [2] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
    Birney, Ewan
    Stamatoyannopoulos, John A.
    Dutta, Anindya
    Guigo, Roderic
    Gingeras, Thomas R.
    Margulies, Elliott H.
    Weng, Zhiping
    Snyder, Michael
    Dermitzakis, Emmanouil T.
    Stamatoyannopoulos, John A.
    Thurman, Robert E.
    Kuehn, Michael S.
    Taylor, Christopher M.
    Neph, Shane
    Koch, Christoph M.
    Asthana, Saurabh
    Malhotra, Ankit
    Adzhubei, Ivan
    Greenbaum, Jason A.
    Andrews, Robert M.
    Flicek, Paul
    Boyle, Patrick J.
    Cao, Hua
    Carter, Nigel P.
    Clelland, Gayle K.
    Davis, Sean
    Day, Nathan
    Dhami, Pawandeep
    Dillon, Shane C.
    Dorschner, Michael O.
    Fiegler, Heike
    Giresi, Paul G.
    Goldy, Jeff
    Hawrylycz, Michael
    Haydock, Andrew
    Humbert, Richard
    James, Keith D.
    Johnson, Brett E.
    Johnson, Ericka M.
    Frum, Tristan T.
    Rosenzweig, Elizabeth R.
    Karnani, Neerja
    Lee, Kirsten
    Lefebvre, Gregory C.
    Navas, Patrick A.
    Neri, Fidencio
    Parker, Stephen C. J.
    Sabo, Peter J.
    Sandstrom, Richard
    Shafer, Anthony
    [J]. NATURE, 2007, 447 (7146) : 799 - 816
  • [3] Phylogenetic shadowing of primate sequences to find functional regions of the human genome
    Boffelli, D
    McAuliffe, J
    Ovcharenko, D
    Lewis, KD
    Ovcharenko, I
    Pachter, L
    Rubin, EM
    [J]. SCIENCE, 2003, 299 (5611) : 1391 - 1394
  • [4] BRYNE J, 2007, NUCL ACIDS RES
  • [5] What are DNA sequence motifs?
    D'haeseleer, P
    [J]. NATURE BIOTECHNOLOGY, 2006, 24 (04) : 423 - 425
  • [6] Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques
    Elnitski, Laura
    Jin, Victor X.
    Farnham, Peggy J.
    Jones, Steven J. M.
    [J]. GENOME RESEARCH, 2006, 16 (12) : 1455 - 1464
  • [7] Integrating multiple evidence sources to predict transcription factor binding in the human genome
    Ernst, Jason
    Plasterer, Heather L.
    Simon, Itamar
    Bar-Joseph, Ziv
    [J]. GENOME RESEARCH, 2010, 20 (04) : 526 - 536
  • [8] THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE
    HANLEY, JA
    MCNEIL, BJ
    [J]. RADIOLOGY, 1982, 143 (01) : 29 - 36
  • [9] Assessing phylogenetic motif models for predicting transcription factor binding sites
    Hawkins, John
    Grant, Charles
    Noble, William Stafford
    Bailey, Timothy L.
    [J]. BIOINFORMATICS, 2009, 25 (12) : I339 - I347
  • [10] Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data
    Jothi, Raja
    Cuddapah, Suresh
    Barski, Artem
    Cui, Kairong
    Zhao, Keji
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16) : 5221 - 5231