Combining multiple ChIP-seq peak detection systems using combinatorial fusion

被引:23
作者
Schweikert, Christina [1 ]
Brown, Stuart [2 ]
Tang, Zuojian [2 ]
Smith, Phillip R. [2 ]
Hsu, D. Frank [1 ]
机构
[1] Fordham Univ, Dept Comp & Informat Sci, Lab Informat & Data Min, New York, NY 10023 USA
[2] NYU, Langone Med Ctr, Ctr Hlth Informat & Bioinformat, New York, NY 10016 USA
来源
BMC GENOMICS | 2012年 / 13卷
关键词
FACTOR-BINDING SITES; GENOME-WIDE ANALYSIS; DNA; RANK; IDENTIFICATION; ENRICHMENT; SELECTION; CRITERIA;
D O I
10.1186/1471-2164-13-S8-S12
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Due to the recent rapid development in ChIP-seq technologies, which uses high-throughput next-generation DNA sequencing to identify the targets of Chromatin Immunoprecipitation, there is an increasing amount of sequencing data being generated that provides us with greater opportunity to analyze genome-wide protein-DNA interactions. In particular, we are interested in evaluating and enhancing computational and statistical techniques for locating protein binding sites. Many peak detection systems have been developed; in this study, we utilize the following six: CisGenome, MACS, PeakSeq, QuEST, SISSRs, and TRLocator. Results: We define two methods to merge and rescore the regions of two peak detection systems and analyze the performance based on average precision and coverage of transcription start sites. The results indicate that ChIP-seq peak detection can be improved by fusion using score or rank combination. Conclusion: Our method of combination and fusion analysis would provide a means for generic assessment of available technologies and systems and assist researchers in choosing an appropriate system (or fusion method) for analyzing ChIP-seq data. This analysis offers an alternate approach for increasing true positive rates, while decreasing false positive rates and hence improving the ChIP-seq peak identification process.
引用
收藏
页数:12
相关论文
共 41 条
[1]   Sole-Search: an integrated analysis program for peak detection and functional annotation using ChIP-seq data [J].
Blahnik, Kimberly R. ;
Dou, Lei ;
O'Geen, Henriette ;
McPhillips, Timothy ;
Xu, Xiaoqin ;
Cao, Alina R. ;
Iyengar, Sushma ;
Nicolet, Charles M. ;
Ludaescher, Bertram ;
Korf, Ian ;
Farnham, Peggy J. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (03) :e13.1-e13.17
[2]   F-Seq: a feature density estimator for high-throughput sequence tags [J].
Boyle, Alan P. ;
Guinney, Justin ;
Crawford, Gregory E. ;
Furey, Terrence S. .
BIOINFORMATICS, 2008, 24 (21) :2537-2538
[3]  
Brown S, 2012, SYSTEMS BIOL APPL CA
[4]   Integration of external signaling pathways with the core transcriptional network in embryonic stem cells [J].
Chen, Xi ;
Xu, Han ;
Yuan, Ping ;
Fang, Fang ;
Huss, Mikael ;
Vega, Vinsensius B. ;
Wong, Eleanor ;
Orlov, Yuriy L. ;
Zhang, Weiwei ;
Jiang, Jianming ;
Loh, Yuin-Han ;
Yeo, Hock Chuan ;
Yeo, Zhen Xuan ;
Narang, Vipin ;
Govindarajan, Kunde Ramamoorthy ;
Leong, Bernard ;
Shahab, Atif ;
Ruan, Yijun ;
Bourque, Guillaume ;
Sung, Wing-Kin ;
Clarke, Neil D. ;
Wei, Chia-Lin ;
Ng, Huck-Hui .
CELL, 2008, 133 (06) :1106-1117
[5]   Identifying significant genes from microarray data [J].
Chuang, HY ;
Liu, HF ;
Brown, S ;
McMunn-Coffran, C ;
Kao, CY ;
Hsu, DF .
BIBE 2004: FOURTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2004, :358-365
[6]   FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology [J].
Fejes, Anthony P. ;
Robertson, Gordon ;
Bilenky, Mikhail ;
Varhol, Richard ;
Bainbridge, Matthew ;
Jones, Steven J. M. .
BIOINFORMATICS, 2008, 24 (15) :1729-1730
[7]  
HO TK, 1994, IEEE T PATTERN ANAL, V16, P66, DOI 10.1109/34.273716
[8]  
Hsu D.F., 2006, Advanced Data Mining Technologies in Bioinformatics, P32
[9]  
Hsu DF, 2010, LECT NOTES ARTIF INT, V6334, P42, DOI 10.1007/978-3-642-15314-3_5
[10]   Comparing rank and score combination methods for data fusion in information retrieval [J].
Hsu D.F. ;
Taksa I. .
Information Retrieval, 2005, 8 (3) :449-480