Sequence features that drive human promoter function and tissue specificity

被引:75
作者
Landolin, Jane M. [2 ]
Johnson, David S. [3 ]
Trinklein, Nathan D. [4 ]
Aldred, Shelly F. [4 ]
Medina, Catherine [3 ]
Shulha, Hennady [1 ]
Weng, Zhiping [1 ]
Myers, Richard M. [3 ,4 ]
机构
[1] Univ Massachusetts, Program Bioinformat & Integrat Biol, Dept Biochem & Mol Pharmacol, Worcester, MA 01655 USA
[2] Lawrence Berkeley Lab, Div Life Sci, Berkeley, CA 94720 USA
[3] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[4] SwitchGear Gen, Menlo Pk, CA 94025 USA
关键词
TRANSCRIPTION-FACTOR-BINDING; HUMAN GENOME; REGULATORY MOTIFS; DNA MOTIFS; IDENTIFICATION; SITES; SELECTION; EXPRESSION; DISCOVERY; ELEMENTS;
D O I
10.1101/gr.100370.109
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.
引用
收藏
页码:890 / 898
页数:9
相关论文
共 39 条
[1]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[2]   Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project [J].
Birney, Ewan ;
Stamatoyannopoulos, John A. ;
Dutta, Anindya ;
Guigo, Roderic ;
Gingeras, Thomas R. ;
Margulies, Elliott H. ;
Weng, Zhiping ;
Snyder, Michael ;
Dermitzakis, Emmanouil T. ;
Stamatoyannopoulos, John A. ;
Thurman, Robert E. ;
Kuehn, Michael S. ;
Taylor, Christopher M. ;
Neph, Shane ;
Koch, Christoph M. ;
Asthana, Saurabh ;
Malhotra, Ankit ;
Adzhubei, Ivan ;
Greenbaum, Jason A. ;
Andrews, Robert M. ;
Flicek, Paul ;
Boyle, Patrick J. ;
Cao, Hua ;
Carter, Nigel P. ;
Clelland, Gayle K. ;
Davis, Sean ;
Day, Nathan ;
Dhami, Pawandeep ;
Dillon, Shane C. ;
Dorschner, Michael O. ;
Fiegler, Heike ;
Giresi, Paul G. ;
Goldy, Jeff ;
Hawrylycz, Michael ;
Haydock, Andrew ;
Humbert, Richard ;
James, Keith D. ;
Johnson, Brett E. ;
Johnson, Ericka M. ;
Frum, Tristan T. ;
Rosenzweig, Elizabeth R. ;
Karnani, Neerja ;
Lee, Kirsten ;
Lefebvre, Gregory C. ;
Navas, Patrick A. ;
Neri, Fidencio ;
Parker, Stephen C. J. ;
Sabo, Peter J. ;
Sandstrom, Richard ;
Shafer, Anthony .
NATURE, 2007, 447 (7146) :799-816
[3]   Functional architecture and evolution of transcriptional elements that drive gene coexpression [J].
Brown, Christopher D. ;
Johnson, David S. ;
Sidow, Arend .
SCIENCE, 2007, 317 (5844) :1557-1560
[4]   Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver [J].
Brunner, Alayne L. ;
Johnson, David S. ;
Kim, Si Wan ;
Valouev, Anton ;
Reddy, Timothy E. ;
Neff, Norma F. ;
Anton, Elizabeth ;
Medina, Catherine ;
Nguyen, Loan ;
Chiao, Eric ;
Oyolu, Chuba B. ;
Schroth, Gary P. ;
Absher, Devin M. ;
Baker, Julie C. ;
Myers, Richard M. .
GENOME RESEARCH, 2009, 19 (06) :1044-1056
[5]   The transcriptional landscape of the mammalian genome [J].
Carninci, P ;
Kasukawa, T ;
Katayama, S ;
Gough, J ;
Frith, MC ;
Maeda, N ;
Oyama, R ;
Ravasi, T ;
Lenhard, B ;
Wells, C ;
Kodzius, R ;
Shimokawa, K ;
Bajic, VB ;
Brenner, SE ;
Batalov, S ;
Forrest, ARR ;
Zavolan, M ;
Davis, MJ ;
Wilming, LG ;
Aidinis, V ;
Allen, JE ;
Ambesi-Impiombato, X ;
Apweiler, R ;
Aturaliya, RN ;
Bailey, TL ;
Bansal, M ;
Baxter, L ;
Beisel, KW ;
Bersano, T ;
Bono, H ;
Chalk, AM ;
Chiu, KP ;
Choudhary, V ;
Christoffels, A ;
Clutterbuck, DR ;
Crowe, ML ;
Dalla, E ;
Dalrymple, BP ;
de Bono, B ;
Della Gatta, G ;
di Bernardo, D ;
Down, T ;
Engstrom, P ;
Fagiolini, M ;
Faulkner, G ;
Fletcher, CF ;
Fukushima, T ;
Furuno, M ;
Futaki, S ;
Gariboldi, M .
SCIENCE, 2005, 309 (5740) :1559-1563
[6]   Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome [J].
Cooper, SJ ;
Trinklein, ND ;
Anton, ED ;
Nguyen, L ;
Myers, RM .
GENOME RESEARCH, 2006, 16 (01) :1-10
[7]   Computational identification and functional validation of regulatory motifs in cartilage-expressed genes [J].
Davies, Sherri R. ;
Chang, Li-Wei ;
Patra, Debabrata ;
Xing, Xiaoyun ;
Posey, Karen ;
Hecht, Jacqueline ;
Stormo, Gary D. ;
Sandell, Linda J. .
GENOME RESEARCH, 2007, 17 (10) :1438-1447
[8]   Detection of functional DNA motifs via statistical over-representation [J].
Frith, MC ;
Fu, YT ;
Yu, LQ ;
Chen, JF ;
Hansen, U ;
Weng, ZP .
NUCLEIC ACIDS RESEARCH, 2004, 32 (04) :1372-1381
[9]   CREB, neurogenesis and depression [J].
Gass, Peter ;
Riva, Marco A. .
BIOESSAYS, 2007, 29 (10) :957-961
[10]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422