A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)

被引:359
作者
Troyanskaya, OG
Dolinski, K
Owen, AB
Altman, RB
Botstein, D
机构
[1] Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA
[2] Stanford Univ, Sch Med, Saccharomyces Genome Database, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
关键词
D O I
10.1073/pnas.0832373100
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genomic sequencing is no longer a novelty, but gene function annotation remains a key challenge in modern biology. A variety of functional genomics experimental techniques are available, from classic methods such as affinity precipitation to advanced high-throughput techniques such as gene expression microarrays. In the future, more disparate methods will be developed, further increasing the need for integrated computational analysis of data generated by these studies. We address this problem with MAGIC (Multisource Association of Genes by Integration of Clusters), a general framework that uses formal Bayesian reasoning to integrate heterogeneous types of high-throughput biological data (such as large-scale two-hybrid screens and multiple microarray analyses) for accurate gene function prediction. The system formally incorporates expert knowledge about relative accuracies of data sources to combine them within a normative framework. MAGIC provides a belief level with its output that allows the user to vary the stringency of predictions. We applied MAGIC to Saccharomyces cerevisiae genetic and physical interactions, microarray, and transcription factor binding sites data and assessed the biological relevance of gene groupings using Gene Ontology annotations produced by the Saccaromyces Genome Database. We found that by creating functional groupings based on heterogeneous data types, MAGIC improved accuracy of the groupings compared with microarray analysis alone. We describe several of the biological gene groupings identified.
引用
收藏
页码:8348 / 8353
页数:6
相关论文
共 31 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   Analyzing yeast protein-protein interaction data obtained from different sources [J].
Bader, GD ;
Hogue, CWV .
NATURE BIOTECHNOLOGY, 2002, 20 (10) :991-997
[3]   USE OF A SCREEN FOR SYNTHETIC LETHAL AND MULTICOPY SUPPRESSEE MUTANTS TO IDENTIFY 2 NEW GENES INVOLVED IN MORPHOGENESIS IN SACCHAROMYCES-CEREVISIAE [J].
BENDER, A ;
PRINGLE, JR .
MOLECULAR AND CELLULAR BIOLOGY, 1991, 11 (03) :1295-1305
[4]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[5]   The GRID: The General Repository for Interaction Datasets [J].
Breitkreutz, BJ ;
Stark, C ;
Tyers, M .
GENOME BIOLOGY, 2003, 4 (03)
[6]   Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) [J].
Dwight, SS ;
Harris, MA ;
Dolinski, K ;
Ball, CA ;
Binkley, G ;
Christie, KR ;
Fisk, DG ;
Issel-Tarver, L ;
Schroeder, M ;
Sherlock, G ;
Sethuraman, A ;
Weng, S ;
Botstein, D ;
Cherry, JM .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :69-72
[7]   A NOVEL GENETIC SYSTEM TO DETECT PROTEIN PROTEIN INTERACTIONS [J].
FIELDS, S ;
SONG, OK .
NATURE, 1989, 340 (6230) :245-246
[8]   Using Bayesian networks to analyze expression data [J].
Friedman, N ;
Linial, M ;
Nachman, I ;
Pe'er, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :601-620
[9]   Genomic expression programs in the response of yeast cells to environmental changes [J].
Gasch, AP ;
Spellman, PT ;
Kao, CM ;
Carmel-Harel, O ;
Eisen, MB ;
Storz, G ;
Botstein, D ;
Brown, PO .
MOLECULAR BIOLOGY OF THE CELL, 2000, 11 (12) :4241-4257
[10]  
Heckerman D., 1991, PROBABILISTIC SIMILA