Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks

被引:157
作者
Nix, David A. [1 ]
Courdy, Samir J. [1 ]
Boucher, Kenneth M. [2 ]
机构
[1] Univ Utah, Dept Res Informat, Huntsman Canc Inst, Salt Lake City, UT 84105 USA
[2] Univ Utah, Dept Oncol Sci, Salt Lake City, UT 84105 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1186/1471-2105-9-523
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High throughput signature sequencing holds many promises, one of which is the ready identification of in vivo transcription factor binding sites, histone modifications, changes in chromatin structure and patterns of DNA methylation across entire genomes. In these experiments, chromatin immunoprecipitation is used to enrich for particular DNA sequences of interest and signature sequencing is used to map the regions to the genome (ChIP-Seq). Elucidation of these sites of DNA-protein binding/modification are proving instrumental in reconstructing networks of gene regulation and chromatin remodelling that direct development, response to cellular perturbation, and neoplastic transformation. Results: Here we present a package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks. Several different methods were compared using two simulated spike-in datasets. Use of control input data and a normalized difference score were found to more than double the recovery of ChIP-Seq peaks at a 5% false discovery rate (FDR). Moreover, both a binomial p-value/q-value and an empirical FDR were found to predict the true FDR within 2 -3 fold and are more reliable estimators of confidence than a global Poisson p-value. These methods were then used to reanalyze Johnson et al.' s neuron-restrictive silencer factor (NRSF) ChIP-Seq data without relying on extensive qPCR validated NRSF sites and the presence of NRSF binding motifs for setting thresholds. Conclusion: The methods developed and tested here show considerable promise for reducing false positives and estimating confidence in ChIP-Seq data without any prior knowledge of the chIP target. They are part of a larger open source package freely available from http://useq.sourceforge.net/.
引用
收藏
页数:9
相关论文
共 15 条
[1]  
Barski A., CELL, V129, P823
[2]   Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE) [J].
Bhinge, Akshay A. ;
Kim, Jonghwan ;
Euskirchen, Ghia M. ;
Snyder, Michael ;
Iyer, Vishwanath R. .
GENOME RESEARCH, 2007, 17 (06) :910-916
[3]   DNA microarray technologies for measuring protein-DNA interactions [J].
Bulyk, Martha L. .
CURRENT OPINION IN BIOTECHNOLOGY, 2006, 17 (04) :422-430
[4]  
COLLAS P, FRONT BIOSCI, V13, P929
[5]  
COX AJ, 2008, ILLUMINA PIPELINE V0
[6]  
FEJES AP, BIOINFORMATICS, V24, P1729
[7]   Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets [J].
Johnson, David S. ;
Li, Wei ;
Gordon, D. Benjamin ;
Bhattacharjee, Arindam ;
Curry, Bo ;
Ghosh, Jayati ;
Brizuela, Leonardo ;
Carroll, Jason S. ;
Brown, Myles ;
Flicek, Paul ;
Koch, Christoph M. ;
Dunham, Ian ;
Bieda, Mark ;
Xu, Xiaoqin ;
Farnham, Peggy J. ;
Kapranov, Philipp ;
Nix, David A. ;
Gingeras, Thomas R. ;
Zhang, Xinmin ;
Holster, Heather ;
Jiang, Nan ;
Green, Roland D. ;
Song, Jun S. ;
Mccuine, Scott A. ;
Anton, Elizabeth ;
Nguyen, Loan ;
Trinklein, Nathan D. ;
Ye, Zhen ;
Ching, Keith ;
Hawkins, David ;
Ren, Bing ;
Scacheri, Peter C. ;
Rozowsky, Joel ;
Karpikov, Alexander ;
Euskirchen, Ghia ;
Weissman, Sherman ;
Gerstein, Mark ;
Snyder, Michael ;
Yang, Annie ;
Moqtaderi, Zarmik ;
Hirsch, Heather ;
Shulha, Hennady P. ;
Fu, Yutao ;
Weng, Zhiping ;
Struhl, Kevin ;
Myers, Richard M. ;
Lieb, Jason D. ;
Liu, X. Shirley .
GENOME RESEARCH, 2008, 18 (03) :393-403
[8]  
JOHNSON DS, SCIENCE, V316, P1497
[9]  
MIKKELSEN TS, NATURE, V448, P553
[10]  
Ng P, 2007, CURR PROTOC MOL BIOL