Integrating multiple evidence sources to predict transcription factor binding in the human genome

被引:83
作者
Ernst, Jason [1 ]
Plasterer, Heather L. [2 ]
Simon, Itamar [3 ]
Bar-Joseph, Ziv [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Machine Learning Dept, Pittsburgh, PA 15213 USA
[2] Whitehead Inst Biomed Res, Cambridge, MA 02142 USA
[3] Hebrew Univ Jerusalem, Sch Med, Dept Mol Biol, IL-91120 Jerusalem, Israel
基金
美国国家科学基金会;
关键词
REGULATORY MOTIFS; VERTEBRATE; CHROMATIN; SITES; EXPRESSION; DISCOVERY; SEQUENCE; BROWSER; REVEALS; MODULES;
D O I
10.1101/gr.096305.109
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Information about the binding preferences of many transcription factors is known and characterized by a sequence binding motif. However, determining regions of the genome in which a transcription factor binds based on its motif is a challenging problem, particularly in species with large genomes, since there are often many sequences containing matches to the motif but are not bound. Several rules based on sequence conservation or location, relative to a transcription start site, have been proposed to help differentiate true binding sites from random ones. Other evidence sources may also be informative for this task. We developed a method for integrating multiple evidence sources using logistic regression classifiers. Our method works in two steps. First, we infer a score quantifying the general binding preferences of transcription factor binding at all locations based on a large set of evidence features, without using any motif specific information. Then, we combined this general binding preference score with motif information for specific transcription factors to improve prediction of regions bound by the factor. Using cross-validation and new experimental data we show that, surprisingly, the general binding preference can be highly predictive of true locations of transcription factor binding even when no binding motif is used. When combined with motif information our method outperforms previous methods for predicting locations of true binding.
引用
收藏
页码:526 / 536
页数:11
相关论文
共 52 条
  • [1] Computational discovery of gene modules and regulatory networks
    Bar-Joseph, Z
    Gerber, GK
    Lee, TI
    Rinaldi, NJ
    Yoo, JY
    Robert, F
    Gordon, DB
    Fraenkel, E
    Jaakkola, TS
    Young, RA
    Gifford, DK
    [J]. NATURE BIOTECHNOLOGY, 2003, 21 (11) : 1337 - 1342
  • [2] High-resolution profiling of histone methylations in the human genome
    Barski, Artern
    Cuddapah, Suresh
    Cui, Kairong
    Roh, Tae-Young
    Schones, Dustin E.
    Wang, Zhibin
    Wei, Gang
    Chepelev, Iouri
    Zhao, Keji
    [J]. CELL, 2007, 129 (04) : 823 - 837
  • [3] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [4] Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences
    Berger, Michael F.
    Badis, Gwenael
    Gehrke, Andrew R.
    Talukder, Shaheynoor
    Philippakis, Anthony A.
    Pena-Castillo, Lourdes
    Alleyne, Trevis M.
    Mnaimneh, Sanie
    Botvinnik, Olga B.
    Chan, Esther T.
    Khalid, Faiqua
    Zhang, Wen
    Newburger, Daniel
    Jaeger, Savina A.
    Morris, Quaid D.
    Bulyk, Martha L.
    Hughes, Timothy R.
    [J]. CELL, 2008, 133 (07) : 1266 - 1276
  • [5] Integrated assessment and prediction of transcription factor binding
    Beyer, Andreas
    Workman, Christopher
    Hollunder, Jens
    Radke, Doerte
    Moeller, Ulrich
    Wilhelm, Thomas
    Ideker, Trey
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (06) : 615 - 626
  • [6] Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression
    Blanchette, M
    Bataille, AR
    Chen, XY
    Poitras, C
    Laganière, J
    Lefèbvre, C
    Deblois, G
    Giguère, V
    Ferretti, V
    Bergeron, D
    Coulombe, B
    Robert, FO
    [J]. GENOME RESEARCH, 2006, 16 (05) : 656 - 668
  • [7] High-resolution mapping and characterization of open chromatin across the genome
    Boyle, Alan P.
    Davis, Sean
    Shulha, Hennady P.
    Meltzer, Paul
    Margulies, Elliott H.
    Weng, Zhiping
    Furey, Terrence S.
    Crawford, Gregory E.
    [J]. CELL, 2008, 132 (02) : 311 - 322
  • [8] Genome-wide analysis of estrogen receptor binding sites
    Carroll, Jason S.
    Meyer, Clifford A.
    Song, Jun
    Li, Wei
    Geistlinger, Timothy R.
    Eeckhoute, Jerome
    Brodsky, Alexander S.
    Keeton, Erika Krasnickas
    Fertuck, Kirsten C.
    Hall, Giles F.
    Wang, Qianben
    Bekiranov, Stefan
    Sementchenko, Victor
    Fox, Edward A.
    Silver, Pamela A.
    Gingeras, Thomas R.
    Liu, X. Shirley
    Brown, Myles
    [J]. NATURE GENETICS, 2006, 38 (11) : 1289 - 1297
  • [9] Integration of genome and chromatin structure with gene expression profiles to predict c-MYC recognition site binding and function
    Chen, Yili
    Blackwell, Thomas W.
    Chen, Ji
    Gao, Jing
    Lee, Angel W.
    States, David J.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (04) : 602 - 615
  • [10] Reconstructing dynamic regulatory maps
    Ernst, Jason
    Vainas, Oded
    Harbison, Christopher T.
    Simon, Itamar
    Bar-Joseph, Ziv
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2007, 3 (1)