A transcription factor affinity-based code for mammalian transcription initiation

被引:41
作者
Megraw, Molly [1 ]
Pereira, Fernando [2 ]
Jensen, Shane T. [3 ]
Ohler, Uwe [1 ]
Hatzigeorgiou, Artemis G. [2 ,4 ]
机构
[1] Duke Univ, Inst Genome Sci & Policy, Durham, NC 27708 USA
[2] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[3] Univ Penn, Wharton Sch, Dept Stat, Philadelphia, PA 19104 USA
[4] Biomed Sci Res Ctr Alexander Fleming, Inst Mol Oncol, Athens, Greece
关键词
OPEN-ACCESS DATABASE; MICRORNA GENES; CORE PROMOTERS; BINDING SITES; START SITES; RNA; IDENTIFICATION; REVEALS; RECOGNITION; PREDICTION;
D O I
10.1101/gr.085449.108
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription start sites (TSS). Genome-wide CAGE studies reveal that a large fraction of TSS exhibit peaks where the vast majority of associated tags map to a particular location (similar to 45%), whereas other active regions contain a broader distribution of initiation events. The presence of a strong single peak suggests that transcription at these locations may be mediated by position-specific sequence features. We therefore propose a new model for single-peaked TSS based solely on known transcription factors (TFs) and their respective regions of positional enrichment. This probabilistic model leads to near-perfect classification results in cross-validation (auROC = 0.98), and performance in genomic scans demonstrates that TSS prediction with both high accuracy and spatial resolution is achievable for a specific but large subgroup of mammalian promoters. The interpretable model structure suggests a DNA code in which canonical sequence features such as TATA-box, Initiator, and GC content do play a significant role, but many additional TFs show distinct spatial biases with respect to TSS location and are important contributors to the accurate prediction of single-peak transcription initiation sites. The model structure also reveals that CAGEtag clusters distal fromannotated gene starts have distinct characteristics compared to those close to gene 59-ends. Using this high-resolution single-peakmodel, we predict TSS for similar to 70% of mammalian microRNAs based on currently available data.
引用
收藏
页码:644 / 656
页数:13
相关论文
共 52 条
[1]   ProSOM:: core promoter prediction based on unsupervised clustering of DNA physical profiles [J].
Abeel, Thomas ;
Saeys, Yvan ;
Rouze, Pierre ;
Van de Peer, Yves .
BIOINFORMATICS, 2008, 24 (13) :I24-I31
[2]   Dragon Gene Start Finder: An advanced system for finding approximate locations of the start of gene transcriptional units [J].
Bajic, VB ;
Seah, SH .
GENOME RESEARCH, 2003, 13 (08) :1923-1929
[3]   Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters [J].
Bajic, VB ;
Seah, SH ;
Chong, A ;
Zhang, GL ;
Koh, JLY ;
Brusic, V .
BIOINFORMATICS, 2002, 18 (01) :198-199
[4]   Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression [J].
Blanchette, M ;
Bataille, AR ;
Chen, XY ;
Poitras, C ;
Laganière, J ;
Lefèbvre, C ;
Deblois, G ;
Giguère, V ;
Ferretti, V ;
Bergeron, D ;
Coulombe, B ;
Robert, FO .
GENOME RESEARCH, 2006, 16 (05) :656-668
[5]   JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update [J].
Bryne, Jan Christian ;
Valen, Eivind ;
Tang, Man-Hung Eric ;
Marstrand, Troels ;
Winther, Ole ;
da Piedade, Isabelle ;
Krogh, Anders ;
Lenhard, Boris ;
Sandelin, Albin .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D102-D106
[6]   COMPILATION AND ANALYSIS OF EUKARYOTIC POL-II PROMOTER SEQUENCES [J].
BUCHER, P ;
TRIFONOV, EN .
NUCLEIC ACIDS RESEARCH, 1986, 14 (24) :10009-10026
[7]   WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES [J].
BUCHER, P .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) :563-578
[8]   The DPE, a conserved downstream core promoter element that is functionally analogous to the TATA box [J].
Burke, TW ;
Willy, PJ ;
Kutach, AK ;
Butler, JEF ;
Kadonaga, JT .
COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY, 1998, 63 :75-82
[9]   Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs [J].
Cai, XZ ;
Hagedorn, CH ;
Cullen, BR .
RNA, 2004, 10 (12) :1957-1966
[10]   The transcriptional landscape of the mammalian genome [J].
Carninci, P ;
Kasukawa, T ;
Katayama, S ;
Gough, J ;
Frith, MC ;
Maeda, N ;
Oyama, R ;
Ravasi, T ;
Lenhard, B ;
Wells, C ;
Kodzius, R ;
Shimokawa, K ;
Bajic, VB ;
Brenner, SE ;
Batalov, S ;
Forrest, ARR ;
Zavolan, M ;
Davis, MJ ;
Wilming, LG ;
Aidinis, V ;
Allen, JE ;
Ambesi-Impiombato, X ;
Apweiler, R ;
Aturaliya, RN ;
Bailey, TL ;
Bansal, M ;
Baxter, L ;
Beisel, KW ;
Bersano, T ;
Bono, H ;
Chalk, AM ;
Chiu, KP ;
Choudhary, V ;
Christoffels, A ;
Clutterbuck, DR ;
Crowe, ML ;
Dalla, E ;
Dalrymple, BP ;
de Bono, B ;
Della Gatta, G ;
di Bernardo, D ;
Down, T ;
Engstrom, P ;
Fagiolini, M ;
Faulkner, G ;
Fletcher, CF ;
Fukushima, T ;
Furuno, M ;
Futaki, S ;
Gariboldi, M .
SCIENCE, 2005, 309 (5740) :1559-1563