Identification of regulatory elements using a feature selection method

被引:74
作者
Keles, S [1 ]
van der Laan, M
Eisen, MB
机构
[1] Univ Calif Berkeley, Div Biostat, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
[3] Ernest Orlando Lawrence Berkeley Natl Lab, Div Life Sci, Berkeley, CA 94720 USA
关键词
D O I
10.1093/bioinformatics/18.9.1167
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Many methods have been described to identify regulatory motifs in the transcription control regions of genes that exhibit similar patterns of gene expression across a variety of experimental conditions. Here we focus on a single experimental condition, and utilize gene expression data to identify sequence motifs associated with genes that are activated under this experimental condition. We use a linear model with two-way interactions to model gene expression as a function of sequence features (words) present in presumptive transcription control regions. The most relevant features are selected by a feature selection method called stepwise selection with monte carlo cross validation. We apply this method to a publicly available dataset of the yeast Saccharomyces cerevisiae, focussing on the 800 basepairs immediately upstream of each gene's translation start site (the upstream control region (UCR)). Results: We successfully identify regulatory motifs that are known to be active under the experimental conditions analyzed, and find additional significant sequences that may represent novel regulatory motifs. We also discuss a complementary method that utilizes gene expression data from a single microarray experiment and allows averaging over variety of experimental conditions as an alternative to motif finding methods that act on clusters of co-expressed genes.
引用
收藏
页码:1167 / 1175
页数:9
相关论文
共 30 条
[1]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[2]  
BILU Y, 2001, P RECOMB MONTR CAN
[3]   SUBMODEL SELECTION AND EVALUATION IN REGRESSION - THE X-RANDOM CASE [J].
BREIMAN, L ;
SPECTOR, P .
INTERNATIONAL STATISTICAL REVIEW, 1992, 60 (03) :291-319
[4]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[5]   Regulatory element detection using correlation with expression [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
NATURE GENETICS, 2001, 27 (02) :167-171
[6]  
CHIANG DY, 2001, BIOINFORMATICS, V17, P49
[7]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73
[8]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[9]   Identifying DNA and protein patterns with statistically significant alignments of multiple sequences [J].
Hertz, GZ ;
Stormo, GD .
BIOINFORMATICS, 1999, 15 (7-8) :563-577
[10]   Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae [J].
Hughes, JD ;
Estep, PW ;
Tavazoie, S ;
Church, GM .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 296 (05) :1205-1214