Nucleotide frequency variation across human genes

被引:77
作者
Louie, E [1 ]
Ott, J [1 ]
Majewski, J [1 ]
机构
[1] Rockefeller Univ, New York, NY 10021 USA
关键词
D O I
10.1101/gr.1317703
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The frequencies of individual nucleotides exhibit significant fluctuations across eukaryotic genes. In this paper, we investigate nucleotide variation across an averaged representation of all known human genes. Such a representation allows us to average out random fluctuations that constitute noise and uncover remarkable systematic trends in nucleotide distributions, particularly near boundaries between genetic elements-the promoter, exons, and introns. We propose that such variations result from differential mutational pressures and from the presence of specific regulatory motifs, such as transcription and splicing factor binding sites. Specifically, we observe significant GC and TA biases (excess of G over C and T over A) in noncoding regions of genes. Such biases are most probably caused by transcription-coupled mismatch repair, an effect that has recently been detected in mammalian genes. Subsequently, we examine the distribution of all hexanucleotides and identify motifs that are overrepresented within regulatory regions. By clustering and aligning such sequences, we recognize families of putative regulatory elements involved in exonic and intronic splicing control, and 3' mRNA processing. Some of our motifs have been identified in prior theoretical and experimental studies, thus validating our approach, but we detect several novel sequences that we propose as candidates for future functional assays and mutation screens for genetic disorders.
引用
收藏
页码:2594 / 2601
页数:8
相关论文
共 49 条
[1]   Why are complementary DNA strands symmetric? [J].
Baisnée, PF ;
Hampson, S ;
Baldi, P .
BIOINFORMATICS, 2002, 18 (08) :1021-1033
[2]   Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data [J].
Beaudoing, E ;
Gautheret, D .
GENOME RESEARCH, 2001, 11 (09) :1520-1526
[3]   Transcription-induced mutations: Increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli [J].
Beletskii, A ;
Bhagwat, AS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (24) :13919-13924
[4]   THE MOSAIC GENOME OF WARM-BLOODED VERTEBRATES [J].
BERNARDI, G ;
OLOFSSON, B ;
FILIPSKI, J ;
ZERIAL, M ;
SALINAS, J ;
CUNY, G ;
MEUNIERROTIVAL, M ;
RODIER, F .
SCIENCE, 1985, 228 (4702) :953-958
[5]   OVER-REPRESENTATION AND UNDER-REPRESENTATION OF SHORT OLIGONUCLEOTIDES IN DNA-SEQUENCES [J].
BURGE, C ;
CAMPBELL, AM ;
KARLIN, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (04) :1358-1362
[6]  
CHARGAFF E, 1951, FED PROC, V10, P654
[7]   Mechanism and regulation of mRNA polyadenylation [J].
Colgan, DF ;
Manley, JL .
GENES & DEVELOPMENT, 1997, 11 (21) :2755-2766
[8]   CPG ISLANDS AND GENES [J].
CROSS, SH ;
BIRD, AP .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 1995, 5 (03) :309-314
[9]   Recognition of polyadenylation sites in yeast pre-mRNAs by cleavage and polyadenylation factor [J].
Dichtl, B ;
Keller, W .
EMBO JOURNAL, 2001, 20 (12) :3197-3209
[10]   Multiple transcript cleavage precedes polymerase release in termination by RNA polymerase II [J].
Dye, MJ ;
Proudfoot, NJ .
CELL, 2001, 105 (05) :669-681