Learning rule-based models of biological process from gene expression time profiles using gene ontology

被引:64
作者
Hvidsten, TR
Lægreid, A
Komorowski, J [1 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp & Informat Sci, N-7491 Trondheim, Norway
[2] Norwegian Univ Sci & Technol, Dept Clin & Mol Med, N-7489 Trondheim, Norway
[3] Uppsala Univ, Linnaeus Ctr Bioinformat, Uppsala, Sweden
关键词
D O I
10.1093/bioinformatics/btg047
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Microarray technology enables large-scale inference of the participation of genes in biological process from similar expression profiles. Our aim is to induce classificatory models from expression data and biological knowledge that can automatically associate genes with novel hypotheses of biological process. Results: We report a systematic supervised learning approach to predicting biological process from time series of gene expression data and biological knowledge. Biological knowledge is expressed using gene ontology and this knowledge is associated with discriminatory expression-based features to form minimal decision rules. The resulting rule model is first evaluated on genes coding for proteins with known biological process roles using cross validation. Then it is used to generate hypotheses for genes for which no knowledge of participation in biological process could be found. The theoretical foundation for the methodology based on rough sets is outlined in the paper, and its practical application demonstrated on a data set previously published by Cho et al. (Nat. Genet., 27,48-54,2001).
引用
收藏
页码:1116 / 1123
页数:8
相关论文
共 23 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]  
BAZAN J, 1994, LECT NOTES ARTIF INT, V869, P346
[3]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[4]   Transcriptional regulation and function during the human cell cycle [J].
Cho, RJ ;
Huang, MX ;
Campbell, MJ ;
Dong, HL ;
Steinmetz, L ;
Sapinoso, L ;
Hampton, G ;
Elledge, SJ ;
Davis, RW ;
Lockhart, DJ .
NATURE GENETICS, 2001, 27 (01) :48-54
[5]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[6]   The KDD process for extracting useful knowledge from volumes of data [J].
Fayyad, U ;
PiatetskyShapiro, G ;
Smyth, P .
COMMUNICATIONS OF THE ACM, 1996, 39 (11) :27-34
[7]   euGenes: a eukaryote genome information system [J].
Gilbert, DG .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :145-148
[8]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[9]   EVALUATING THE YIELD OF MEDICAL TESTS [J].
HARRELL, FE ;
CALIFF, RM ;
PRYOR, DB ;
LEE, KL ;
ROSATI, RA .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1982, 247 (18) :2543-2546
[10]  
Hvidsten T R, 2001, Pac Symp Biocomput, P299