A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets

被引:329
作者
Savitski, Mikhail M. [1 ]
Wilhelm, Mathias [2 ,3 ]
Hahne, Hannes [2 ]
Kuster, Bernhard [2 ,4 ]
Bantscheff, Marcus [1 ]
机构
[1] Cellzome GmbH, D-69117 Heidelberg, Germany
[2] Tech Univ Munich, Prote & Bioanalyt, D-85354 Freising Weihenstephan, Germany
[3] SAP SE, D-69190 Walldorf, Germany
[4] Ctr Integrated Prot Sci Munich, D-85354 Freising Weihenstephan, Germany
关键词
TANDEM MASS-SPECTROMETRY; SHOTGUN PROTEOMICS; IDENTIFICATION; SEARCH; CELLS; PERFORMANCE; CONFIDENCE; QUADRUPOLE; PEPTIDES; PROJECT;
D O I
10.1074/mcp.M114.046995
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target-decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target-decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of similar to 19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The "picked" protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The "picked" target-decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used "classic" protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software.
引用
收藏
页码:2394 / 2404
页数:11
相关论文
共 46 条
[1]   Comparison of Novel Decoy Database Designs for Optimizing Protein Identification Searches Using ABRF sPRG2006 Standard MS/MS Data Sets [J].
Bianco, Luca ;
Mead, Jennifer A. ;
Bessant, Conrad .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (04) :1782-1791
[2]   False discovery rates and related statistical concepts in mass spectrometry-based proteomics [J].
Choi, Hyungwon ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :47-50
[3]  
Cottrell J., 2013, DOES PROTEIN FDR HAV
[4]   Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment [J].
Cox, Juergen ;
Neuhauser, Nadin ;
Michalski, Annette ;
Scheltema, Richard A. ;
Olsen, Jesper V. ;
Mann, Matthias .
JOURNAL OF PROTEOME RESEARCH, 2011, 10 (04) :1794-1805
[5]   MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification [J].
Cox, Juergen ;
Mann, Matthias .
NATURE BIOTECHNOLOGY, 2008, 26 (12) :1367-1372
[6]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[7]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[8]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[9]   State of the Human Proteome in 2013 as Viewed through PeptideAtlas: Comparing the Kidney, Urine, and Plasma Proteomes for the Biology- and Disease-Driven Human Proteome Project [J].
Farrah, Terry ;
Deutsch, Eric W. ;
Omenn, Gilbert S. ;
Sun, Zhi ;
Watts, Julian D. ;
Yamamoto, Tadashi ;
Shteynberg, David ;
Harris, Micheleen M. ;
Moritz, Robert L. .
JOURNAL OF PROTEOME RESEARCH, 2014, 13 (01) :60-75
[10]   Open mass spectrometry search algorithm [J].
Geer, LY ;
Markey, SP ;
Kowalak, JA ;
Wagner, L ;
Xu, M ;
Maynard, DM ;
Yang, XY ;
Shi, WY ;
Bryant, SH .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (05) :958-964