kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

被引:102
作者
Fletez-Brant, Christopher [1 ]
Lee, Dongwon [2 ]
McCallion, Andrew S. [1 ,3 ]
Beer, Michael A. [1 ,2 ]
机构
[1] Johns Hopkins Univ, Sch Med, McKusick Nathans Inst Genet Med, Baltimore, MD 21205 USA
[2] Johns Hopkins Univ, Sch Med, Dept Biomed Engn, Baltimore, MD 21205 USA
[3] Johns Hopkins Univ, Dept Mol & Comparat Pathobiol, Sch Med, Baltimore, MD 21205 USA
关键词
TRANSCRIPTION FACTOR; MOTIF DISCOVERY; CHIP-SEQ; CHROMATIN; INTEGRATION; ENHANCERS; NETWORK; REGIONS;
D O I
10.1093/nar/gkt519
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21: 216780). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.
引用
收藏
页码:W544 / W556
页数:13
相关论文
共 37 条
[1]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[2]   Predicting gene expression from sequence [J].
Beer, MA ;
Tavazoie, S .
CELL, 2004, 117 (02) :185-198
[3]   Support Vector Machines and Kernels for Computational Biology [J].
Ben-Hur, Asa ;
Ong, Cheng Soon ;
Sonnenburg, Soeren ;
Schoelkopf, Bernhard ;
Raetsch, Gunnar .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (10)
[4]  
Blankenberg Daniel, 2010, Curr Protoc Mol Biol, VChapter 19, DOI 10.1002/0471142727.mb1910s89
[5]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[6]   A novel ensemble learning method for de novo computational identification of DNA binding sites [J].
Chakravarty, Arijit ;
Carlson, Jonathan M. ;
Khetani, Radhika S. ;
Gross, Robert H. .
BMC BIOINFORMATICS, 2007, 8 (1)
[7]   Integration of external signaling pathways with the core transcriptional network in embryonic stem cells [J].
Chen, Xi ;
Xu, Han ;
Yuan, Ping ;
Fang, Fang ;
Huss, Mikael ;
Vega, Vinsensius B. ;
Wong, Eleanor ;
Orlov, Yuriy L. ;
Zhang, Weiwei ;
Jiang, Jianming ;
Loh, Yuin-Han ;
Yeo, Hock Chuan ;
Yeo, Zhen Xuan ;
Narang, Vipin ;
Govindarajan, Kunde Ramamoorthy ;
Leong, Bernard ;
Shahab, Atif ;
Ruan, Yijun ;
Bourque, Guillaume ;
Sung, Wing-Kin ;
Clarke, Neil D. ;
Wei, Chia-Lin ;
Ng, Huck-Hui .
CELL, 2008, 133 (06) :1106-1117
[8]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[9]   Architecture of the human regulatory network derived from ENCODE data [J].
Gerstein, Mark B. ;
Kundaje, Anshul ;
Hariharan, Manoj ;
Landt, Stephen G. ;
Yan, Koon-Kiu ;
Cheng, Chao ;
Mu, Xinmeng Jasmine ;
Khurana, Ekta ;
Rozowsky, Joel ;
Alexander, Roger ;
Min, Renqiang ;
Alves, Pedro ;
Abyzov, Alexej ;
Addleman, Nick ;
Bhardwaj, Nitin ;
Boyle, Alan P. ;
Cayting, Philip ;
Charos, Alexandra ;
Chen, David Z. ;
Cheng, Yong ;
Clarke, Declan ;
Eastman, Catharine ;
Euskirchen, Ghia ;
Frietze, Seth ;
Fu, Yao ;
Gertz, Jason ;
Grubert, Fabian ;
Harmanci, Arif ;
Jain, Preti ;
Kasowski, Maya ;
Lacroute, Phil ;
Leng, Jing ;
Lian, Jin ;
Monahan, Hannah ;
O'Geen, Henriette ;
Ouyang, Zhengqing ;
Partridge, E. Christopher ;
Patacsil, Dorrelyn ;
Pauli, Florencia ;
Raha, Debasish ;
Ramirez, Lucia ;
Reddy, Timothy E. ;
Reed, Brian ;
Shi, Minyi ;
Slifer, Teri ;
Wang, Jing ;
Wu, Linfeng ;
Yang, Xinqiong ;
Yip, Kevin Y. ;
Zilberman-Schapira, Gili .
NATURE, 2012, 489 (7414) :91-100
[10]   Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences [J].
Goecks, Jeremy ;
Nekrutenko, Anton ;
Taylor, James .
GENOME BIOLOGY, 2010, 11 (08)