A critical assessment of Mus musculus gene function prediction using integrated genomic evidence

被引:174
作者
Pena-Castillo, Lourdes [1 ]
Tasan, Murat [2 ]
Myers, Chad L. [3 ,4 ]
Lee, Hyunju [5 ]
Joshi, Trupti [6 ,7 ]
Zhang, Chao [6 ,7 ]
Guan, Yuanfang [3 ,4 ]
Leone, Michele [8 ]
Pagnani, Andrea [8 ]
Kim, Wan Kyu [9 ]
Krumpelman, Chase [10 ]
Tian, Weidong [2 ]
Obozinski, Guillaume [11 ]
Qi, Yanjun [12 ]
Mostafavi, Sara [13 ]
Lin, Guan Ning [6 ,7 ]
Berriz, Gabriel F. [2 ]
Gibbons, Francis D. [2 ]
Lanckriet, Gert [14 ]
Qiu, Jian [15 ]
Grant, Charles [15 ]
Barutcuoglu, Zafer [16 ]
Hill, David P. [17 ]
Warde-Farley, David [13 ]
Grouios, Chris [1 ]
Ray, Debajyoti [18 ]
Blake, Judith A. [17 ]
Deng, Minghua [19 ,20 ]
Jordan, Michael I. [21 ,22 ]
Noble, William S. [15 ,23 ]
Morris, Quaid [1 ,13 ,24 ]
Klein-Seetharaman, Judith [25 ]
Bar-Joseph, Ziv [12 ]
Chen, Ting [26 ]
Sun, Fengzhu [26 ]
Troyanskaya, Olga G. [3 ,4 ]
Marcotte, Edward M. [9 ]
Xu, Dong [6 ,7 ]
Hughes, Timothy R. [1 ,24 ]
Roth, Frederick P. [2 ,27 ]
机构
[1] Univ Toronto, Donnelly Ctr Cellular & Biomol Res, Toronto, ON M5S 3E1, Canada
[2] Harvard Univ, Sch Med, Dept Biol Chem & Mol Pharmacol, Boston, MA 02115 USA
[3] Princeton Univ, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USA
[4] Princeton Univ, Dept Mol Biol, Princeton, NJ 08544 USA
[5] Gwangju Inst Sci & Technol, Dept Informat & Commun, Kwangju 500712, South Korea
[6] Univ Missouri, Dept Comp Sci, Digital Biol Lab, Columbia, MO 65211 USA
[7] Univ Missouri, Christopher S Bond Life Sci Ctr, Columbia, MO 65211 USA
[8] ISI Fdn, I-10133 Turin, Italy
[9] Univ Texas Austin, Inst Cellular & Mol Biol, Ctr Syst & Synthet Biol, Austin, TX 78712 USA
[10] Univ Texas Austin, Inst Cellular & Mol Biol, Dept Elect & Comp Engn, Austin, TX 78712 USA
[11] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[12] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[13] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
[14] Univ Calif San Diego, Dept Elect & Comp Engn, La Jolla, CA 92093 USA
[15] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[16] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[17] Jackson Lab, Bar Harbor, ME 04609 USA
[18] Gatsby Computat Neurosci Unit, London WC1N 3AR, England
[19] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[20] Peking Univ, Ctr Theoret Biol, Beijing 100871, Peoples R China
[21] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
[22] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[23] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
[24] Univ Toronto, Banting & Best Dept Med Res, Toronto, ON M5S 3E1, Canada
[25] Univ Pittsburgh, Sch Med, Dept Biol Struct, Pittsburgh, PA 15260 USA
[26] Univ So Calif, Dept Biol Sci, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[27] Dana Farber Canc Inst, Ctr Canc Syst Biol, Boston, MA 02115 USA
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
D O I
10.1186/gb-2008-9-s1-s2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. Results: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. Conclusion: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized.
引用
收藏
页数:19
相关论文
共 58 条
[1]  
Abuin A, 2007, Handb Exp Pharmacol, P129
[2]   Gene prioritization through genomic data fusion [J].
Aerts, S ;
Lambrechts, D ;
Maity, S ;
Van Loo, P ;
Coessens, B ;
De Smet, F ;
Tranchevent, LC ;
De Moor, B ;
Marynen, P ;
Hassan, B ;
Carmeliet, P ;
Moreau, Y .
NATURE BIOTECHNOLOGY, 2006, 24 (05) :537-544
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   Hierarchical multi-label prediction of gene function [J].
Barutcuoglu, Z ;
Schapire, RE ;
Troyanskaya, OG .
BIOINFORMATICS, 2006, 22 (07) :830-836
[5]   Online predicted human interaction database [J].
Brown, KR ;
Jurisica, I .
BIOINFORMATICS, 2005, 21 (09) :2076-2082
[6]   Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae [J].
Chen, Y ;
Xu, D .
NUCLEIC ACIDS RESEARCH, 2004, 32 (21) :6414-6424
[7]   Computational analyses of high-throughput protein-protein interaction data [J].
Chen, Y ;
Xu, D .
CURRENT PROTEIN & PEPTIDE SCIENCE, 2003, 4 (03) :159-180
[8]   Functional bioinformatics for Arabidopsis thaliana [J].
Clare, A ;
Karwath, A ;
Ougham, H ;
King, RD .
BIOINFORMATICS, 2006, 22 (09) :1130-1136
[9]   A mouse for all reasons [J].
Collins, Francis S. .
CELL, 2007, 128 (01) :9-13
[10]   The mouse genome database (MGD): new features facilitating a model system [J].
Eppig, Janan T. ;
Blake, Judith A. ;
Bult, Carol J. ;
Kadin, James A. ;
Richardson, Joel E. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D630-D637