An expanded evaluation of protein function prediction methods shows an improvement in accuracy

被引:256
作者
Jiang, Yuxiang [1 ]
Oron, Tal Ronnen [2 ]
Clark, Wyatt T. [3 ]
Bankapur, Asma R. [4 ]
D'Andrea, Daniel [5 ]
Lepore, Rosalba [5 ]
Funk, Christopher S. [6 ]
Kahanda, Indika [7 ]
Verspoor, Karin M. [8 ,9 ]
Ben-Hur, Asa [7 ]
Koo, Da Chen Emily [10 ]
Penfold-Brown, Duncan [11 ,12 ]
Shasha, Dennis [13 ]
Youngs, Noah [12 ,13 ,14 ]
Bonneau, Richard [13 ,14 ,15 ]
Lin, Alexandra [16 ]
Sahraeian, Sayed M. E. [17 ]
Martelli, Pier Luigi [18 ]
Profiti, Giuseppe [18 ]
Casadio, Rita [18 ]
Cao, Renzhi [19 ]
Zhong, Zhaolong [19 ]
Cheng, Jianlin [19 ]
Altenhoff, Adrian [20 ,21 ]
Skunca, Nives [20 ,21 ]
Dessimoz, Christophe [22 ,89 ,90 ]
Dogan, Tunca [23 ]
Hakala, Kai [24 ,25 ]
Kaewphan, Suwisa [24 ,25 ,26 ]
Mehryary, Farrokh [24 ,25 ]
Salakoski, Tapio [24 ,26 ]
Ginter, Filip [24 ]
Fang, Hai [27 ]
Smithers, Ben [27 ]
Oates, Matt [27 ]
Gough, Julian [27 ]
Toronen, Petri [28 ]
Koskinen, Patrik [28 ]
Holm, Liisa [28 ,88 ]
Chen, Ching-Tai [29 ]
Hsu, Wen-Lian [29 ]
Bryson, Kevin [22 ]
Cozzetto, Domenico [22 ]
Minneci, Federico [22 ]
Jones, David T. [22 ]
Chapman, Samuel [30 ]
Dukka, B. K. C. [30 ]
Khan, Ishita K. [31 ]
Kihara, Daisuke [31 ,87 ]
Ofer, Dan [32 ]
机构
[1] Indiana Univ, Bloomington, IN 47405 USA
[2] Buck Inst Res Aging, Novato, CA USA
[3] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT USA
[4] Miami Univ, Dept Microbiol, Oxford, OH 45056 USA
[5] Univ Roma La Sapienza, Rome, Italy
[6] Univ Colorado, Sch Med, Computat Biosci Program, Aurora, CO USA
[7] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
[8] Univ Melbourne, Dept Comp & Informat Syst, Parkville, Vic, Australia
[9] Univ Melbourne, Hlth & Biomed Informat Ctr, Parkville, Vic, Australia
[10] NYU, Dept Biol, New York, NY 10003 USA
[11] NYU, Social Media & Polit Participat Lab, New York, NY USA
[12] CY Data Sci, New York, NY USA
[13] NYU, Dept Comp Sci, New York, NY USA
[14] Simons Ctr Data Anal, New York, NY USA
[15] NYU, Dept Biol, Ctr Genom & Syst Biol, New York, NY 10003 USA
[16] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
[17] Univ Calif Berkeley, Dept Plant & Microbial Biol, Berkeley, CA 94720 USA
[18] Univ Bologna, BiGeA, Biocomp Grp, Bologna, Italy
[19] Univ Missouri, Dept Comp Sci, Columbia, MO USA
[20] Swiss Fed Inst Technol, Zurich, Switzerland
[21] Swiss Inst Bioinformat, Zurich, Switzerland
[22] UCL, Dept Comp Sci, Bioinformat Grp, London, England
[23] European Bioinformat Inst, European Mol Biol Lab, Cambridge, England
[24] Univ Turku, Dept Informat Technol, Turku, Finland
[25] Univ Turku, Grad Sch, Turku, Finland
[26] Turku Ctr Comp Sci, Turku, Finland
[27] Univ Bristol, Bristol, Avon, England
[28] Univ Helsinki, Inst Biotechnol, Helsinki, Finland
[29] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
[30] North Carolina A&T State Univ, Dept Computat Sci & Engn, Greensboro, NC USA
[31] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[32] Hebrew Univ Jerusalem, Inst Life Sci, Dept Biol Chem, Jerusalem, Israel
[33] Hebrew Univ Jerusalem, Sch Comp Sci & Engn, Jerusalem, Israel
[34] Imperial Coll London, Ctr Integrat Syst Biol & Bioinformat, Dept Life Sci, London, England
[35] UCL, Inst Cardiovasc Sci, Ctr Cardiovasc Genet, London, England
[36] Katholieke Univ Leuven, STADIUS Ctr Dynam Syst Signal Proc & Data Analyt, Dept Elect Engn, Leuven, Belgium
[37] IMinds Dept Med Informat Technol, Leuven, Belgium
[38] Canc Res Ctr Lyon, INSERM, CNRS, UMR S1052,UMR5286, Lyon, France
[39] Univ Lyon 1, Villeurbanne, France
[40] Ctr Leon Berard, Lyon, France
[41] UCL, Inst Struct & Mol Biol, London, England
[42] Cerenode Inc, Boston, MA USA
[43] Molde Univ Coll, Molde, Norway
[44] Royal Holloway Univ London, Ctr Syst & Synthet Biol, Dept Comp Sci, Egham, Surrey, England
[45] Univ Calif Los Angeles, Dept Mol Cell & Dev Biol, Los Angeles, CA USA
[46] Natl Univ Ireland, Sch Math Stat & Appl Math, Galway, Ireland
[47] Cold Spring Harbor Lab, Stanley Inst Cognit Genom, New York, NY USA
[48] Univ British Columbia, Grad Program Bioinformat, Vancouver, BC, Canada
[49] Univ British Columbia, Dept Psychiat, Vancouver, BC, Canada
[50] Univ British Columbia, Michael Smith Labs, Vancouver, BC, Canada
来源
GENOME BIOLOGY | 2016年 / 17卷
基金
瑞士国家科学基金会; 美国国家卫生研究院; 巴西圣保罗研究基金会; 中国国家自然科学基金; 英国生物技术与生命科学研究理事会; 澳大利亚研究理事会; 加拿大自然科学与工程研究理事会; 芬兰科学院; 美国国家科学基金会;
关键词
Protein function prediction; Disease gene prioritization; DISINTEGRIN; ONTOLOGY; ADAM;
D O I
10.1186/s13059-016-1037-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
引用
收藏
页数:19
相关论文
共 19 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 1993, An introduction to the bootstrap
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]  
Bairoch A, 2005, NUCL ACIDS RES, V33, P154
[5]  
Brocker Chad N., 2009, Human Genomics, V4, P43
[6]   RETRACTED: Identification, characterization, and intracellular processing of ADAM-TS12, a novel human disintegrin with a complex structural organization involving multiple thrombospondin-1 repeats (Retracted Article) [J].
Cal, S ;
Argüelles, JM ;
Fernández, PL ;
López-Otin, C .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2001, 276 (21) :17932-17940
[7]   Information-theoretic evaluation of predicted ontological annotations [J].
Clark, Wyatt T. ;
Radivojac, Predrag .
BIOINFORMATICS, 2013, 29 (13) :53-61
[8]   Analysis of protein function and its prediction from amino acid sequence [J].
Clark, Wyatt T. ;
Radivojac, Predrag .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (07) :2086-2096
[9]   Seeking the Wisdom of Crowds Through Challenge-Based Competitions in Biomedical Research [J].
Costello, J. C. ;
Stolovitzky, G. .
CLINICAL PHARMACOLOGY & THERAPEUTICS, 2013, 93 (05) :396-398
[10]   CAFA and the Open World of protein function predictions [J].
Dessimoz, Christophe ;
Skunca, Nives ;
Thomas, Paul D. .
TRENDS IN GENETICS, 2013, 29 (11) :609-610