Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites

被引:240
作者
Xie, Xiaohui
Mikkelsen, Tarjei S.
Gnirke, Andreas
Lindblad-Toh, Kerstin
Kellis, Manolis
Lander, Eric S. [1 ]
机构
[1] MIT, Broad Inst, Cambridge, MA 02142 USA
[2] Harvard Univ, Sch Med, Cambridge, MA 02142 USA
[3] MIT, Div Hlth Sci & Technol, Cambridge, MA 02139 USA
[4] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[5] MIT, Dept Biol, Cambridge, MA 02139 USA
[6] Whitehead Inst Biomed Res, Cambridge, MA 02142 USA
关键词
comparative genomics; conserved noncoding element;
D O I
10.1073/pnas.0701811104
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Conserved noncoding elements (CNEs) constitute the majority of sequences under purifying selection in the human genome, yet their function remains largely unknown. Experimental evidence suggests that many of these elements play regulatory roles, but little is known about regulatory motifs contained within them. Here we describe a systematic approach to discover and characterize regulatory motifs within mammalian CNEs by searching for long motifs (12-22 nt) with significant enrichment in CNEs and studying their biochemical and genomic properties. Our analysis identifies 233 long motifs (LMs), matching a total of approximate to 60,000 conserved instances across the human genome. These motifs include 16 previously known regulatory elements, such as the histone 3'-UTR motif and the neuron-restrictive silencer element, as well as striking examples of novel functional elements. The most highly enriched motif (LM1) corresponds to the X-box motif known from yeast and nematode. We show that it is bound by the RFX1 protein and identify thousands of conserved motif instances, suggesting a broad role for the RFX family in gene regulation. A second group of motifs (LM2*) does not match any previously known motif. We demonstrate by biochemical and computational methods that it defines a binding site for the CTCF protein, which is involved in insulator function to limit the spread of gene activation. We identify nearly 15,000 conserved sites that likely serve as insulators, and we show that nearby genes separated by predicted CTCF sites show markedly reduced correlation in gene expression. These sites may thus partition the human genome into domains of expression.
引用
收藏
页码:7145 / 7150
页数:6
相关论文
共 28 条
[1]   The many faces of REST oversee epigenetic programming of neuronal genes [J].
Ballas, N ;
Mandel, G .
CURRENT OPINION IN NEUROBIOLOGY, 2005, 15 (05) :500-506
[2]   A distal enhancer and an ultraconserved exon are derived from a novel retroposon [J].
Bejerano, G ;
Lowe, CB ;
Ahituv, N ;
King, B ;
Siepel, A ;
Salama, SR ;
Rubin, EM ;
Kent, WJ ;
Haussler, D .
NATURE, 2006, 441 (7089) :87-90
[3]   The protein CTCF is required for the enhancer blocking activity of vertebrate insulators [J].
Bell, AC ;
West, AG ;
Felsenfeld, G .
CELL, 1999, 98 (03) :387-396
[4]   Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene [J].
Bell, AC ;
Felsenfeld, G .
NATURE, 2000, 405 (6785) :482-485
[5]   Functional genomics of the cilium, a sensory organelle [J].
Blacque, OE ;
Perens, EA ;
Boroevich, KA ;
Inglis, PN ;
Li, CM ;
Warner, A ;
Khattra, J ;
Holt, RA ;
Ou, GS ;
Mah, AK ;
McKay, SJ ;
Huang, P ;
Swoboda, P ;
Jones, SJM ;
Marra, MA ;
Baillie, DL ;
Moerman, DG ;
Shaham, S ;
Leroux, MR .
CURRENT BIOLOGY, 2005, 15 (10) :935-941
[6]   Genome-wide analysis of repressor element 1 silencing transcription factor/neuron-restrictive silencing factor (REST/NRSF) target genes [J].
Bruce, AW ;
Donaldson, IJ ;
Wood, IC ;
Yerbury, SA ;
Sadowski, MI ;
Chapman, M ;
Göttgens, B ;
Buckley, NJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (28) :10458-10463
[7]   REST - A MAMMALIAN SILENCER PROTEIN THAT RESTRICTS SODIUM-CHANNEL GENE-EXPRESSION TO NEURONS [J].
CHONG, JHA ;
TAPIARAMIREZ, J ;
KIM, S ;
TOLEDOARAL, JJ ;
ZHENG, YC ;
BOUTROS, MC ;
ALTSHULLER, YM ;
FROHMAN, MA ;
KRANER, SD ;
MANDEL, G .
CELL, 1995, 80 (06) :949-957
[8]   Conserved non-genic sequences - an unexpected feature of mammalian genomes [J].
Dermitzakis, ET ;
Reymond, A ;
Antonarakis, SE .
NATURE REVIEWS GENETICS, 2005, 6 (02) :151-157
[9]   Analysis of xbx genes in C-elegans [J].
Efimenko, E ;
Bubb, K ;
Mak, HY ;
Holzman, T ;
Leroux, MR ;
Ruvkun, G ;
Thomas, JH ;
Swoboda, P .
DEVELOPMENT, 2005, 132 (08) :1923-1934
[10]   Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach [J].
Elemento, O ;
Tavazoie, S .
GENOME BIOLOGY, 2005, 6 (02)