De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis

被引:55
作者
Boeva, Valentina [1 ,2 ,3 ,4 ]
Surdez, Didier [1 ,2 ]
Guillon, Noelle [1 ,2 ]
Tirode, Franck [1 ,2 ]
Fejes, Anthony P. [5 ]
Delattre, Olivier [1 ,2 ]
Barillot, Emmanuel [1 ,3 ,4 ]
机构
[1] Inst Curie, F-75248 Paris, France
[2] INSERM, U830, F-75248 Paris, France
[3] INSERM, U900, F-75248 Paris, France
[4] Mines ParisTech, F-77300 Fontainebleau, France
[5] BC Canc Agcy, Genome Sci Ctr, Vancouver, BC V5Z 4S6, Canada
关键词
HUMAN GENOME; CHROMATIN IMMUNOPRECIPITATION; EWING SARCOMA; PROTEIN; CELLS; TECHNOLOGY; PROJECT; DESIGN; TARGET; DOMAIN;
D O I
10.1093/nar/gkq217
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered > 2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to similar to 150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression-positively as often as negatively-and at much larger distances (up to similar to 1 Mb).
引用
收藏
页码:e126 / e126
页数:9
相关论文
共 32 条
[11]   Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family [J].
Hollenhorst, Peter C. ;
Shah, Atul A. ;
Hopkins, Christopher ;
Graves, Barbara J. .
GENES & DEVELOPMENT, 2007, 21 (15) :1882-1894
[12]   A STAT PROTEIN DOMAIN THAT DETERMINES DNA-SEQUENCE RECOGNITION SUGGESTS A NOVEL DNA-BINDING DOMAIN [J].
HORVATH, CM ;
WEN, ZL ;
DARNELL, JE .
GENES & DEVELOPMENT, 1995, 9 (08) :984-994
[13]   An integrated software system for analyzing ChIP-chip and ChIP-seq data [J].
Ji, Hongkai ;
Jiang, Hui ;
Ma, Wenxiu ;
Johnson, David S. ;
Myers, Richard M. ;
Wong, Wing H. .
NATURE BIOTECHNOLOGY, 2008, 26 (11) :1293-1300
[14]   Genome-wide mapping of in vivo protein-DNA interactions [J].
Johnson, David S. ;
Mortazavi, Ali ;
Myers, Richard M. ;
Wold, Barbara .
SCIENCE, 2007, 316 (5830) :1497-1502
[15]   Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data [J].
Jothi, Raja ;
Cuddapah, Suresh ;
Barski, Artem ;
Cui, Kairong ;
Zhao, Keji .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16) :5221-5231
[16]   The UCSC Genome Browser Database: 2008 update [J].
Karolchik, D. ;
Kuhn, R. M. ;
Baertsch, R. ;
Barber, G. P. ;
Clawson, H. ;
Diekhans, M. ;
Giardine, B. ;
Harte, R. A. ;
Hinrichs, A. S. ;
Hsu, F. ;
Kober, K. M. ;
Miller, W. ;
Pedersen, J. S. ;
Pohl, A. ;
Raney, B. J. ;
Rhead, B. ;
Rosenbloom, K. R. ;
Smith, K. E. ;
Stanke, M. ;
Thakkapallayil, A. ;
Trumbower, H. ;
Wang, T. ;
Zweig, A. S. ;
Haussler, D. ;
Kent, W. J. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D773-D779
[17]   Design and analysis of ChIP-seq experiments for DNA-binding proteins [J].
Kharchenko, Peter V. ;
Tolstorukov, Michael Y. ;
Park, Peter J. .
NATURE BIOTECHNOLOGY, 2008, 26 (12) :1351-1359
[18]   Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome [J].
Kim, Tae Hoon ;
Abdullaev, Ziedulla K. ;
Smith, Andrew D. ;
Ching, Keith A. ;
Loukinov, Dmitri I. ;
Green, Roland D. ;
Zhang, Michael Q. ;
Lobanenkov, Victor V. ;
Ren, Bing .
CELL, 2007, 128 (06) :1231-1245
[19]   Initial sequencing and analysis of the human genome [J].
Lander, ES ;
Int Human Genome Sequencing Consortium ;
Linton, LM ;
Birren, B ;
Nusbaum, C ;
Zody, MC ;
Baldwin, J ;
Devon, K ;
Dewar, K ;
Doyle, M ;
FitzHugh, W ;
Funke, R ;
Gage, D ;
Harris, K ;
Heaford, A ;
Howland, J ;
Kann, L ;
Lehoczky, J ;
LeVine, R ;
McEwan, P ;
McKernan, K ;
Meldrim, J ;
Mesirov, JP ;
Miranda, C ;
Morris, W ;
Naylor, J ;
Raymond, C ;
Rosetti, M ;
Santos, R ;
Sheridan, A ;
Sougnez, C ;
Stange-Thomann, N ;
Stojanovic, N ;
Subramanian, A ;
Wyman, D ;
Rogers, J ;
Sulston, J ;
Ainscough, R ;
Beck, S ;
Bentley, D ;
Burton, J ;
Clee, C ;
Carter, N ;
Coulson, A ;
Deadman, R ;
Deloukas, P ;
Dunham, A ;
Dunham, I ;
Durbin, R ;
French, L .
NATURE, 2001, 409 (6822) :860-921
[20]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858