False Discovery Rates of Protein Identifications: A Strike against the Two-Peptide Rule

被引:150
作者
Gupta, Nitin [1 ]
Pevzner, Pavel A. [2 ]
机构
[1] Univ Calif San Diego, Bioinformat Program, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Comp Sci, La Jolla, CA 92093 USA
关键词
two-peptide rule; false discovery rate; mass spectrometry; peptide identification; protein identification; decoy database; false positives; MASS-SPECTROMETRY; IDENTIFYING PROTEINS; STATISTICAL-MODEL; TANDEM; PEPTIDE;
D O I
10.1021/pr9004794
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Most proteomics studies attempt to maximize the number of peptide identifications and subsequently infer proteins containing two or more peptides as reliable protein identifications. In this study, we evaluate the effect of this "two-peptide" rule on protein identifications, using multiple search tools and data sets. Contrary to the intuition, the "two-peptide" rule reduces the number of protein identifications in the target database more significantly than in the decoy database and results in increased false discovery rates, compared to the case when single-hit proteins are not discarded. We therefore recommend that the "two-peptide" rule should be abandoned, and instead, protein identifications should be subject to the estimation of error rates, as is the case with peptide identifications. We further extend the generating function approach (originally proposed for evaluating matches between a peptide and a single spectrum) to evaluating matches between a protein and an entire spectral data set.
引用
收藏
页码:4173 / 4181
页数:9
相关论文
共 29 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]  
Alves P, 2007, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007, P409
[3]   Peptidomics: The integrated approach of MS, hyphenated techniques and bioinformatics for neuropeptide analysis [J].
Boonen, Kurt ;
Landuyt, Bart ;
Baggerman, Geert ;
Husson, Steven J. ;
Huybrechts, Jurgen ;
Schoofs, Liliane .
JOURNAL OF SEPARATION SCIENCE, 2008, 31 (03) :427-445
[4]   Reporting protein identification data - The next generation of guidelines [J].
Bradshaw, RA ;
Burlingame, AL ;
Carr, S ;
Aebersold, R .
MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (05) :787-788
[5]   Potential for false positive identifications from large databases through tandem mass spectrometry [J].
Cargile, BJ ;
Bundy, JL ;
Stephenson, JL .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (05) :1082-1085
[6]   The need for guidelines in publication of peptide and protein identification data - Working group on publication guidelines for peptide and protein identification data [J].
Carr, S ;
Aebersold, R ;
Baldwin, M ;
Burlingame, A ;
Clauser, K ;
Nesvizhskii, A .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (06) :531-533
[7]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[8]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[9]   SwePep, a database designed for endogenous peptides and mass spectrometry [J].
Falth, Maria ;
Skold, Karl ;
Norrman, Mathias ;
Svensson, Marcus ;
Fenyo, David ;
Andren, Per E. .
MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (06) :998-1005
[10]   Probability model for assessing proteins assembled from peptide sequences inferred from tandem mass spectrometry data [J].
Feng, Jian ;
Naiman, Daniel Q. ;
Cooper, Bret .
ANALYTICAL CHEMISTRY, 2007, 79 (10) :3901-3911