Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction

被引:119
作者
Jansen, R
Gerstein, M
机构
[1] Mem Sloan Kettering Canc Ctr, Computat Biol Ctr, New York, NY 10021 USA
[2] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[3] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
关键词
D O I
10.1016/j.mib.2004.08.012
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
The concept of 'protein function' is rather 'fuzzy' because it is often based on whimsical terms or contradictory nomenclature. This currently presents a challenge for functional genomics because precise definitions are essential for most computational approaches. Addressing this challenge, the notion of networks between biological entities (including molecular and genetic interaction networks as well as transcriptional regulatory relationships) potentially provides a unifying language suitable for the systematic description of protein function. Predicting the edges in protein networks requires reference sets of examples with known outcome (that is, 'gold standards'). Such reference sets should ideally include positive examples - as is now widely appreciated - but also, equally importantly, negative ones. Moreover, it is necessary to consider the expected relative occurrence of positives and negatives because this affects the misclassification rates of experiments and computational predictions. For instance, a reason why genome-wide, experimental protein-protein interaction networks have high inaccuracies is that the prior probability of finding interactions (positives) rather than non-interacting protein pairs (negatives) in unbiased screens is very small. These problems can be addressed by constructing well-defined sets of non-interacting proteins from subcellular localization data, which allows computing the probability of interactions based on evidence from multiple datasets.
引用
收藏
页码:535 / 545
页数:11
相关论文
共 55 条
[41]   Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization:: a combined computational approach -: art. no. R59 [J].
Strong, M ;
Mallick, P ;
Pellegrini, M ;
Thompson, MJ ;
Eisenberg, D .
GENOME BIOLOGY, 2003, 4 (09)
[42]   A gene-coexpression network for global discovery of conserved genetic modules [J].
Stuart, JM ;
Segal, E ;
Koller, D ;
Kim, SK .
SCIENCE, 2003, 302 (5643) :249-255
[43]   Systematic interactome mapping and genetic perturbation analysis of a C-elegans TGF-β signaling network [J].
Tewari, M ;
Hu, PJ ;
Ahn, JS ;
Ayivi-Guedehoussou, N ;
Vidalain, PO ;
Li, SM ;
Milstein, S ;
Armstrong, CM ;
Boxem, M ;
Butler, MD ;
Busiguina, S ;
Rual, JF ;
Ibarrola, N ;
Chaklos, ST ;
Bertin, N ;
Vaglio, P ;
Edgley, ML ;
King, KV ;
Albert, PS ;
Vandenhaute, J ;
Pandey, A ;
Riddle, DL ;
Ruvkun, G ;
Vidal, M .
MOLECULAR CELL, 2004, 13 (04) :469-482
[44]   Global mapping of the yeast genetic interaction network [J].
Tong, AHY ;
Lesage, G ;
Bader, GD ;
Ding, HM ;
Xu, H ;
Xin, XF ;
Young, J ;
Berriz, GF ;
Brost, RL ;
Chang, M ;
Chen, YQ ;
Cheng, X ;
Chua, G ;
Friesen, H ;
Goldberg, DS ;
Haynes, J ;
Humphries, C ;
He, G ;
Hussein, S ;
Ke, LZ ;
Krogan, N ;
Li, ZJ ;
Levinson, JN ;
Lu, H ;
Ménard, P ;
Munyana, C ;
Parsons, AB ;
Ryan, O ;
Tonikian, R ;
Roberts, T ;
Sdicu, AM ;
Shapiro, J ;
Sheikh, B ;
Suter, B ;
Wong, SL ;
Zhang, LV ;
Zhu, HW ;
Burd, CG ;
Munro, S ;
Sander, C ;
Rine, J ;
Greenblatt, J ;
Peter, M ;
Bretscher, A ;
Bell, G ;
Roth, FP ;
Brown, GW ;
Andrews, B ;
Bussey, H ;
Boone, C .
SCIENCE, 2004, 303 (5659) :808-813
[45]   A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) [J].
Troyanskaya, OG ;
Dolinski, K ;
Owen, AB ;
Altman, RB ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (14) :8348-8353
[46]   A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae [J].
Uetz, P ;
Giot, L ;
Cagney, G ;
Mansfield, TA ;
Judson, RS ;
Knight, JR ;
Lockshon, D ;
Narayan, V ;
Srinivasan, M ;
Pochart, P ;
Qureshi-Emili, A ;
Li, Y ;
Godwin, B ;
Conover, D ;
Kalbfleisch, T ;
Vijayadamodar, G ;
Yang, MJ ;
Johnston, M ;
Fields, S ;
Rothberg, JM .
NATURE, 2000, 403 (6770) :623-627
[47]   Global protein function prediction from protein-protein interaction networks [J].
Vazquez, A ;
Flammini, A ;
Maritan, A ;
Vespignani, A .
NATURE BIOTECHNOLOGY, 2003, 21 (06) :697-700
[48]   Comparative assessment of large-scale data sets of protein-protein interactions [J].
von Mering, C ;
Krause, R ;
Snel, B ;
Cornell, M ;
Oliver, SG ;
Fields, S ;
Bork, P .
NATURE, 2002, 417 (6887) :399-403
[49]   Functional characterization of the S-cerevisiae genome by gene deletion and parallel analysis [J].
Winzeler, EA ;
Shoemaker, DD ;
Astromoff, A ;
Liang, H ;
Anderson, K ;
Andre, B ;
Bangham, R ;
Benito, R ;
Boeke, JD ;
Bussey, H ;
Chu, AM ;
Connelly, C ;
Davis, K ;
Dietrich, F ;
Dow, SW ;
EL Bakkoury, M ;
Foury, F ;
Friend, SH ;
Gentalen, E ;
Giaever, G ;
Hegemann, JH ;
Jones, T ;
Laub, M ;
Liao, H ;
Liebundguth, N ;
Lockhart, DJ ;
Lucau-Danila, A ;
Lussier, M ;
M'Rabet, N ;
Menard, P ;
Mittmann, M ;
Pai, C ;
Rebischung, C ;
Revuelta, JL ;
Riles, L ;
Roberts, CJ ;
Ross-MacDonald, P ;
Scherens, B ;
Snyder, M ;
Sookhai-Mahadeo, S ;
Storms, RK ;
Véronneau, S ;
Voet, M ;
Volckaert, G ;
Ward, TR ;
Wysocki, R ;
Yen, GS ;
Yu, KX ;
Zimmermann, K ;
Philippsen, P .
SCIENCE, 1999, 285 (5429) :901-906
[50]   Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data [J].
Wu, BL ;
Abbott, T ;
Fishman, D ;
McMurray, W ;
Mor, G ;
Stone, K ;
Ward, D ;
Williams, K ;
Zhao, HY .
BIOINFORMATICS, 2003, 19 (13) :1636-1643