Computational detection and location of transcription start sites in mammalian genomic DNA

被引:203
作者
Down, TA [1 ]
Hubbard, TJP [1 ]
机构
[1] Wellcome Trust Sanger Inst, Hinxton CB10 1SA, Cambs, England
关键词
D O I
10.1101/gr.216102
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Transcription, the process whereby RNA copies are made from sections of the DNA genome, is directed by promoter regions. These define the transcription start site, and also the set of cellular conditions under which the promoter is active. At least in more complex species, it appears to be common for genes to have several different transcription start sites, which may be active under different conditions. Eukaryotic promoters are complex and fairly diffuse structures, which have proven hard to detect in sillco. We show that a novel hybrid machine-learning method is able to build useful models of promoters for >50% of human transcription start sites. We estimate specificity to be >70%, and demonstrate good positional accuracy. Based on the structure of our learned models, we conclude that a signal resembling the well known TATA box, together with flanking regions of C-G enrichment, are the most important sequence-based signals marking sites of transcriptional initiation at a large class of typical promoters.
引用
收藏
页码:458 / 461
页数:4
相关论文
共 15 条
[1]   Detection of eukaryotic promoters using Markov transition matrices [J].
Audic, S ;
Claverie, JM .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :223-227
[2]   WEIGHT MATRIX DESCRIPTIONS OF 4 EUKARYOTIC RNA POLYMERASE-II PROMOTER ELEMENTS DERIVED FROM 502 UNRELATED PROMOTER SEQUENCES [J].
BUCHER, P .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (04) :563-578
[3]   The DNA sequence of human chromosome 22 [J].
Dunham, I ;
Shimizu, N ;
Roe, BA ;
Chissoe, S ;
Dunham, I ;
Hunt, AR ;
Collins, JE ;
Bruskiewich, R ;
Beare, DM ;
Clamp, M ;
Smink, LJ ;
Ainscough, R ;
Almeida, JP ;
Babbage, A ;
Bagguley, C ;
Balley, J ;
Barlow, K ;
Bates, KN ;
Beasley, O ;
Bird, CP ;
Blakey, S ;
Bridgeman, AM ;
Buck, D ;
Burgess, J ;
Burrill, WD ;
Burton, J ;
Carder, C ;
Carter, NP ;
Chen, Y ;
Clark, G ;
Clegg, SM ;
Cobley, V ;
Cole, CG ;
Collier, RE ;
Connor, RE ;
Conroy, D ;
Corby, N ;
Coville, GJ ;
Cox, AV ;
Davis, J ;
Dawson, E ;
Dhami, PD ;
Dockree, C ;
Dodsworth, SJ ;
Durbin, RM ;
Ellington, A ;
Evans, KL ;
Fey, JM ;
Fleming, K ;
French, L .
NATURE, 1999, 402 (6761) :489-495
[4]   Eukaryotic promoter recognition [J].
Fickett, JW ;
Hatzigeorgiou, AC .
GENOME RESEARCH, 1997, 7 (09) :861-878
[5]  
Grundy WN, 1997, COMPUT APPL BIOSCI, V13, P397
[6]   Functional annotation of a full-length mouse cDNA collection [J].
Kawai, J ;
Shinagawa, A ;
Shibata, K ;
Yoshino, M ;
Itoh, M ;
Ishii, Y ;
Arakawa, T ;
Hara, A ;
Fukunishi, Y ;
Konno, H ;
Adachi, J ;
Fukuda, S ;
Aizawa, K ;
Izawa, M ;
Nishi, K ;
Kiyosawa, H ;
Kondo, S ;
Yamanaka, I ;
Saito, T ;
Okazaki, Y ;
Gojobori, T ;
Bono, H ;
Kasukawa, T ;
Saito, R ;
Kadota, K ;
Matsuda, H ;
Ashburner, M ;
Batalov, S ;
Casavant, T ;
Fleischmann, W ;
Gaasterland, T ;
Gissi, C ;
King, B ;
Kochiwa, H ;
Kuehl, P ;
Lewis, S ;
Matsuo, Y ;
Nikaido, I ;
Pesole, G ;
Quackenbush, J ;
Schriml, LM ;
Staubli, F ;
Suzuki, R ;
Tomita, M ;
Wagner, L ;
Washio, T ;
Sakai, K ;
Okido, T ;
Furuno, M ;
Aono, H .
NATURE, 2001, 409 (6821) :685-690
[7]   Promoter2.0: for the recognition of PolII promoter sequences [J].
Knudsen, S .
BIOINFORMATICS, 1999, 15 (05) :356-361
[8]   The hormone-sensitive lipase gene is transcribed from at least five alternative first exons in mouse adipose tissue [J].
Laurin, NN ;
Wang, SP ;
Mitchell, GA .
MAMMALIAN GENOME, 2000, 11 (11) :972-978
[9]  
Nelder JA, 1983, GEN LINEAR MODELS
[10]   SSAHA: A fast search method for large DNA databases [J].
Ning, ZM ;
Cox, AJ ;
Mullikin, JC .
GENOME RESEARCH, 2001, 11 (10) :1725-1729