Association of nucleotide patterns with gene function classes:: application to human 3′ untranslated sequences

被引:20
作者
Conklin, D
Jonassen, I
Aasland, R
Taylor, WR
机构
[1] Zymogenet Inc, Seattle, WA 98102 USA
[2] Univ Bergen, Dept Informat, N-5020 Bergen, Norway
[3] Univ Bergen, Dept Mol Biol, N-5020 Bergen, Norway
[4] Natl Inst Med Res, Div Math Biol, London NW7 1AA, England
关键词
D O I
10.1093/bioinformatics/18.1.182
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Gene expression is dependent on two main types of signals; one involving transcription factors which initiates gene transcription, and another which regulates the translation of a nascent mRNA. These posttranscriptional events play an important yet incompletely understood role in regulating gene expression and cellular behavior. Many of the identified cis acting elements for translational regulation occur within the 3' untranslated region (3' UTR), and some have been observed to occur with surprising regularity within certain protein function classes. Results: In this study, we present a new association rule mining method for discovering nucleotide sequence patterns that appear in more sequences than expected within protein function classes. The method is applied to a database of human 3' UTR sequences, and some significant associations between nucleotide patterns and protein function classes are discovered. Among previously identified patterns, the AU-Rich Element (ARE) is found here to occur within the 3' UTR of cytokines, providing statistical validation of an association often reported in the literature. The method has also identified some GC-rich patterns, found to occur within the 3' UTR of homeodomain transcription factors and nuclear proteins. The method should be applicable to many types of regulatory element discovery. Contact: conklin@zgi.com.
引用
收藏
页码:182 / 189
页数:8
相关论文
共 37 条
[1]  
AGARWAL R, 1996, ADV KNOWLEDGE DISCOV, P307
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Efficient detection of unusual words [J].
Apostolico, A ;
Bock, ME ;
Lonardi, S ;
Xu, XY .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (1-2) :71-94
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[6]   The Pfam protein families database [J].
Bateman, A ;
Birney, E ;
Durbin, R ;
Eddy, SR ;
Howe, KL ;
Sonnhammer, ELL .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :263-266
[7]   Approaches to the automatic discovery of patterns in biosequences [J].
Brazma, A ;
Jonassen, I ;
Eidhammer, I ;
Gilbert, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) :279-305
[8]  
BRAZMA A, 1997, P 5 INT C INT SYST M, P65
[9]   CONSERVED STRUCTURES AND DIVERSITY OF FUNCTIONS OF RNA-BINDING PROTEINS [J].
BURD, CG ;
DREYFUSS, G .
SCIENCE, 1994, 265 (5172) :615-621
[10]   AU-RICH ELEMENTS - CHARACTERIZATION AND IMPORTANCE IN MESSENGER-RNA DEGRADATION [J].
CHEN, CYA ;
SHYU, AB .
TRENDS IN BIOCHEMICAL SCIENCES, 1995, 20 (11) :465-470