An effective statistical evaluation of ChIPseq dataset similarity

被引:50
作者
Chikina, Maria D. [2 ]
Troyanskaya, Olga G. [1 ]
机构
[1] Princeton Univ, Lewis Sigler Inst Integrat Genom, Dept Comp Sci & Mol Biol, Princeton, NJ 08540 USA
[2] Mt Sinai Sch Med, Dept Neurol, New York, NY 10029 USA
关键词
EMBRYONIC STEM-CELLS; REGULATORY ELEMENTS; PLURIPOTENCY; SEQ; MYC; DIFFERENTIATION; NETWORK; NANOG; LIVER; STATE;
D O I
10.1093/bioinformatics/bts009
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: ChIPseq is rapidly becoming a common technique for investigating protein- DNA interactions. However, results from individual experiments provide a limited understanding of chromatin structure, as various chromatin factors cooperate in complex ways to orchestrate transcription. In order to quantify chromtain interactions, it is thus necessary to devise a robust similarity metric applicable to ChIPseq data. Unfortunately, moving past simple overlap calculations to give statistically rigorous comparisons of ChIPseq datasets often involves arbitrary choices of distance metrics, with significance being estimated by computationally intensive permutation tests whose statistical power may be sensitive to non- biological experimental and post- processing variation. Results: We show that it is in fact possible to compare ChIPseq datasets through the efficient computation of exact P-values for proximity. Our method is insensitive to non-biological variation in datasets such as peak width, and can rigorously model peak location biases by evaluating similarity conditioned on a restricted set of genomic regions (such as mappable genome or promoter regions). Applying our method to the well-studied dataset of Chen et al. (2008), we elucidate novel interactions which conform well with our biological understanding. By comparing ChIPseq data in an asymmetric way, we are able to observe clear interaction differences between cofactors such as p300 and factors that bind DNA directly. Availability: Source code is available for download at http://sonorus.princeton.edu/IntervalStats/IntervalStats.tar.gz
引用
收藏
页码:607 / 613
页数:7
相关论文
共 26 条
[1]   Multivariate Hawkes process models of the occurrence of regulatory elements [J].
Carstensen, Lisbeth ;
Sandelin, Albin ;
Winther, Ole ;
Hansen, Niels R. .
BMC BIOINFORMATICS, 2010, 11
[2]   Integration of external signaling pathways with the core transcriptional network in embryonic stem cells [J].
Chen, Xi ;
Xu, Han ;
Yuan, Ping ;
Fang, Fang ;
Huss, Mikael ;
Vega, Vinsensius B. ;
Wong, Eleanor ;
Orlov, Yuriy L. ;
Zhang, Weiwei ;
Jiang, Jianming ;
Loh, Yuin-Han ;
Yeo, Hock Chuan ;
Yeo, Zhen Xuan ;
Narang, Vipin ;
Govindarajan, Kunde Ramamoorthy ;
Leong, Bernard ;
Shahab, Atif ;
Ruan, Yijun ;
Bourque, Guillaume ;
Sung, Wing-Kin ;
Clarke, Neil D. ;
Wei, Chia-Lin ;
Ng, Huck-Hui .
CELL, 2008, 133 (06) :1106-1117
[3]   Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains [J].
Cuddapah, Suresh ;
Jothi, Raja ;
Schones, Dustin E. ;
Roh, Tae-Young ;
Cui, Kairong ;
Zhao, Keji .
GENOME RESEARCH, 2009, 19 (01) :24-32
[4]  
Fu AQ, 2009, MOL BIOSYST, V5, P1429, DOI [10.1039/b906880e, 10.1039/B906880e]
[5]   Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals [J].
Guttman, Mitchell ;
Amit, Ido ;
Garber, Manuel ;
French, Courtney ;
Lin, Michael F. ;
Feldser, David ;
Huarte, Maite ;
Zuk, Or ;
Carey, Bryce W. ;
Cassady, John P. ;
Cabili, Moran N. ;
Jaenisch, Rudolf ;
Mikkelsen, Tarjei S. ;
Jacks, Tyler ;
Hacohen, Nir ;
Bernstein, Bradley E. ;
Kellis, Manolis ;
Regev, Aviv ;
Rinn, John L. ;
Lander, Eric S. .
NATURE, 2009, 458 (7235) :223-227
[6]   Locus co-occupancy, nucleosome positioning, and H3K4me1 regulate the functionality of FOXA2-, HNF4A-, and PDX1-bound loci in islets and liver [J].
Hoffman, Brad G. ;
Robertson, Gordon ;
Zavaglia, Bogard ;
Beach, Mike ;
Cullum, Rebecca ;
Lee, Sam ;
Soukhatcheva, Galina ;
Li, Leping ;
Wederell, Elizabeth D. ;
Thiessen, Nina ;
Bilenky, Mikhail ;
Cezard, Timothee ;
Tam, Angela ;
Kamoh, Baljit ;
Birol, Inanc ;
Dai, Derek ;
Zhao, Yongjun ;
Hirst, Martin ;
Verchere, C. Bruce ;
Helgason, Cheryl D. ;
Marra, Marco A. ;
Jones, Steven J. M. ;
Hoodless, Pamela A. .
GENOME RESEARCH, 2010, 20 (08) :1037-1051
[7]   On the use of resampling tests for evaluating statistical significance of binding-site co-occurrence [J].
Huen, David S. ;
Russell, Steven .
BMC BIOINFORMATICS, 2010, 11
[8]   Transcriptional control: Versatile molecular glue [J].
Janknecht, R ;
Hunter, T .
CURRENT BIOLOGY, 1996, 6 (08) :951-954
[9]   Genome-wide mapping of in vivo protein-DNA interactions [J].
Johnson, David S. ;
Mortazavi, Ali ;
Myers, Richard M. ;
Wold, Barbara .
SCIENCE, 2007, 316 (5830) :1497-1502
[10]   Generation of Induced Pluripotent Stem Cells by Efficient Reprogramming of Adult Bone Marrow Cells [J].
Kunisato, Atsushi ;
Wakatsuki, Mariko ;
Kodama, Yuuki ;
Shinba, Haruna ;
Ishida, Isao ;
Nagao, Kenji .
STEM CELLS AND DEVELOPMENT, 2010, 19 (02) :229-238