Estimating the statistical significance of peptide identifications from shotgun proteomics experiments

被引:53
作者
Higgs, Richard E. [1 ]
Knierman, Michael D. [1 ]
Freeman, Angela Bonner [1 ]
Gelbert, Lawrence M. [1 ]
Patil, Sandeep T. [1 ]
Hale, John E. [1 ]
机构
[1] Lilly Corp Ctr, Lilly Res Labs, Indianapolis, IN 46285 USA
关键词
peptide identification; false discovery rate; Sequest; X! Tandem; statistical significance; proteomics;
D O I
10.1021/pr0605320
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present a wrapper-based approach to estimate and control the false discovery rate for peptide identifications using the outputs from multiple commercially available MS/MS search engines. Features of the approach include the flexibility to combine output from multiple search engines with sequence and spectral derived features in a flexible classification model to produce a score associated with correct peptide identifications. This classification model score from a reversed database search is taken as the null distribution for estimating p-values and false discovery rates using a simple and established statistical procedure. Results from 10 analyses of rat sera on an LTQ-FT mass spectrometer indicate that the method is well calibrated for controlling the proportion of false positives in a set of reported peptide identifications while correctly identifying more peptides than rule-based methods using one search engine alone. Keywords: peptide identification center dot false discovery rate center dot Sequest center dot X! Tandem center dot statistical significance center dot proteomics
引用
收藏
页码:1758 / 1767
页数:10
相关论文
共 42 条
[1]   A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: Support vector machine classification of peptide MS/MS spectra and SEQUEST scores [J].
Anderson, DC ;
Li, WQ ;
Payan, DG ;
Noble, WS .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (02) :137-146
[2]  
[Anonymous], 2001, Bioinformatics
[3]   Artificial neural network analysis for evaluation of peptide MS/MS spectra in proteomics [J].
Baczek, T ;
Bucinski, A ;
Ivanov, AR ;
Kaliszan, R .
ANALYTICAL CHEMISTRY, 2004, 76 (06) :1726-1732
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Automatic Quality Assessment of Peptide Tandem Mass Spectra [J].
Bern, Marshall ;
Goldberg, David ;
McDonald, W. Hayes ;
Yates, John R., III .
BIOINFORMATICS, 2004, 20 :49-54
[6]   Reporting protein identification data - The next generation of guidelines [J].
Bradshaw, RA ;
Burlingame, AL ;
Carr, S ;
Aebersold, R .
MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (05) :787-788
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data [J].
Cannon, WR ;
Jarman, KH ;
Webb-Robertson, BJM ;
Baxter, DJ ;
Oehmen, CS ;
Jarman, KD ;
Heredia-Langner, A ;
Auberry, KJ ;
Anderson, GA .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (05) :1687-1698
[9]   Epigenetic targets in hematopoietic malignancies [J].
Claus, R ;
Lübbert, M .
ONCOGENE, 2003, 22 (42) :6489-6496
[10]   A method for reducing the time required to match protein sequences with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2003, 17 (20) :2310-2316