Integrating multiple evidence sources to predict transcription factor binding in the human genome

被引:83
作者
Ernst, Jason [1 ]
Plasterer, Heather L. [2 ]
Simon, Itamar [3 ]
Bar-Joseph, Ziv [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Machine Learning Dept, Pittsburgh, PA 15213 USA
[2] Whitehead Inst Biomed Res, Cambridge, MA 02142 USA
[3] Hebrew Univ Jerusalem, Sch Med, Dept Mol Biol, IL-91120 Jerusalem, Israel
基金
美国国家科学基金会;
关键词
REGULATORY MOTIFS; VERTEBRATE; CHROMATIN; SITES; EXPRESSION; DISCOVERY; SEQUENCE; BROWSER; REVEALS; MODULES;
D O I
10.1101/gr.096305.109
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Information about the binding preferences of many transcription factors is known and characterized by a sequence binding motif. However, determining regions of the genome in which a transcription factor binds based on its motif is a challenging problem, particularly in species with large genomes, since there are often many sequences containing matches to the motif but are not bound. Several rules based on sequence conservation or location, relative to a transcription start site, have been proposed to help differentiate true binding sites from random ones. Other evidence sources may also be informative for this task. We developed a method for integrating multiple evidence sources using logistic regression classifiers. Our method works in two steps. First, we infer a score quantifying the general binding preferences of transcription factor binding at all locations based on a large set of evidence features, without using any motif specific information. Then, we combined this general binding preference score with motif information for specific transcription factors to improve prediction of regions bound by the factor. Using cross-validation and new experimental data we show that, surprisingly, the general binding preference can be highly predictive of true locations of transcription factor binding even when no binding motif is used. When combined with motif information our method outperforms previous methods for predicting locations of true binding.
引用
收藏
页码:526 / 536
页数:11
相关论文
共 52 条
  • [21] Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources
    Lahdesmaki, Harri
    Rust, Alistair G.
    Shmulevich, Ilya
    [J]. PLOS ONE, 2008, 3 (03):
  • [22] Genome-wide mapping of RELA(p65) binding identifies E2F1 as a transcriptional activator recruited by NF-κB upon TLR4 activation
    Lim, Ching-Aeng
    Yao, Fei
    Wong, Joyce Jing-Yi
    George, Joshy
    Xu, Han
    Chiu, Kuo Ping
    Sung, Wing-Kin
    Lipovich, Leonard
    Vega, Vinsensius B.
    Chen, Joanne
    Shahab, Atif
    Zhao, Xiao Doing
    Hibberd, Martin
    Wei, Chia-Lin
    Lim, Bing
    Ng, Huck-Hui
    Ruan, Yijun
    Chin, Keh-Chuang
    [J]. MOLECULAR CELL, 2007, 27 (04) : 622 - 635
  • [23] Whole-genome cartography of estrogen receptor α binding sites
    Lin, Chin-Yo
    Vega, Vinsensius B.
    Thomsen, Jane S.
    Zhang, Tao
    Kong, Say Li
    Xie, Min
    Chiu, Kuo Ping
    Lipovich, Leonard
    Barnett, Daniel H.
    Stossi, Fabio
    Yeo, Ailing
    George, Joshy
    Kuznetsov, Vladimir A.
    Lee, Yew Kok
    Charn, Tze Howe
    Palanisamy, Nallasivam
    Miller, Lance D.
    Cheung, Edwin
    Katzenellenbogen, Benita S.
    Ruan, Yijun
    Bourque, Guillaume
    Wei, Chia-Lin
    Liu, Edison T.
    [J]. PLOS GENETICS, 2007, 3 (06): : 867 - 885
  • [24] The human genomic melting map
    Liu, Fang
    Tostesen, Eivind
    Sundet, Jostein K.
    Jenssen, Tor-Kristian
    Bock, Christoph
    Jerstad, Geir Ivar
    Thilly, William G.
    Hovig, Eivind
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (05) : 874 - 886
  • [25] Genome-wide identification of human functional DNA using a neutral indel model
    Lunter, Gerton
    Ponting, Chris P.
    Hein, Jotun
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (01) : 2 - 12
  • [26] FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription
    Lupien, Mathieu
    Eeckhoute, Jerome
    Meyer, Clifford A.
    Wang, Qianben
    Zhang, Yong
    Li, Wei
    Carroll, Jason S.
    Liu, X. Shirley
    Brown, Myles
    [J]. CELL, 2008, 132 (06) : 958 - 970
  • [27] TRANSFAC®:: transcriptional regulation, from patterns to profiles
    Matys, V
    Fricke, E
    Geffers, R
    Gössling, E
    Haubrock, M
    Hehl, R
    Hornischer, K
    Karas, D
    Kel, AE
    Kel-Margoulis, OV
    Kloos, DU
    Land, S
    Lewicki-Potapov, B
    Michael, H
    Münch, R
    Reuter, I
    Rotert, S
    Saxel, H
    Scheer, M
    Thiele, S
    Wingender, E
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 374 - 378
  • [28] Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b
    McGaughey, David M.
    Vinton, Ryan M.
    Huynh, Jimmy
    Al-Saif, Amr
    Beer, Michael A.
    McCallion, Andrew S.
    [J]. GENOME RESEARCH, 2008, 18 (02) : 252 - 260
  • [29] 28-way vertebrate alignment and conservation track in the UCSC Genome Browser
    Miller, Webb
    Rosenbloom, Kate
    Hardison, Ross C.
    Hou, Minmei
    Taylor, James
    Raney, Brian
    Burhans, Richard
    King, David C.
    Baertsch, Robert
    Blankenberg, Daniel
    Pond, Sergei L. Kosakovsky
    Nekrutenko, Anton
    Giardine, Belinda
    Harris, Robert S.
    Diekhans, Svitlana Tyekucheva Mark
    Diekhans, Mark
    Pringle, Thomas H.
    Murphy, William J.
    Lesk, Arthur
    Weinstock, George M.
    Lindblad-Toh, Kerstin
    Gibbs, Richard A.
    Lander, Eric S.
    Siepel, Adam
    Haussler, David
    Kent, W. James
    [J]. GENOME RESEARCH, 2007, 17 (12) : 1797 - 1808
  • [30] A nucleosome-guided map of transcription factor binding sites in yeast
    Narlikar, Leelavati
    Gordan, Raluca
    Hartemink, Alexander J.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (11) : 2199 - 2208