A predictive model for identifying proteins by a single peptide match

被引:52
作者
Higdon, Roger
Kolker, Eugene [1 ]
机构
[1] BIATECH Inst, Bothell, WA 98011 USA
[2] Univ Washington, Div Biomed & Hlth Informat, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btl595
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Tandem mass-spectrometry of trypsin digests, followed by database searching, is one of the most popular approaches in high-throughput proteomics studies. Peptides are considered identified if they pass certain scoring thresholds. To avoid false positive protein identification, >= 2 unique peptides identified within a single protein are generally recommended. Still, in a typical high-throughput experiment, hundreds of proteins are identified only by a single peptide. We introduce here a method for distinguishing between true and false identifications among single-hit proteins. The approach is based on randomized database searching and usage of logistic regression models with cross-validation. This approach is implemented to analyze three bacterial samples enabling recovery 68-98% of the correct single-hit proteins with an error rate of < 2%. This results in a 22-65% increase in number of identified proteins. Identifying true single-hit proteins will lead to discovering many crucial regulators, biomarkers and other low abundance proteins.
引用
收藏
页码:277 / 280
页数:4
相关论文
共 27 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]  
BEAUSOLEIL SA, 2006, NAT BIOTECHNOL
[3]   Reporting protein identification data - The next generation of guidelines [J].
Bradshaw, RA ;
Burlingame, AL ;
Carr, S ;
Aebersold, R .
MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (05) :787-788
[4]   Potential for false positive identifications from large databases through tandem mass spectrometry [J].
Cargile, BJ ;
Bundy, JL ;
Stephenson, JL .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (05) :1082-1085
[5]   The need for guidelines in publication of peptide and protein identification data - Working group on publication guidelines for peptide and protein identification data [J].
Carr, S ;
Aebersold, R ;
Baldwin, M ;
Burlingame, A ;
Clauser, K ;
Nesvizhskii, A .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (06) :531-533
[6]  
Doolittle R.F., 1986, Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences
[7]   Global detection and characterization of hypothetical proteins in Shewanella oneidensis MR-1 using LC-MS based proteomics [J].
Elias, DA ;
Monroe, ME ;
Marshall, MJ ;
Romine, MF ;
Belieav, AS ;
Fredrickson, JK ;
Anderson, GA ;
Smith, RD ;
Lipton, MS .
PROTEOMICS, 2005, 5 (12) :3120-3130
[8]   Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations [J].
Elias, JE ;
Haas, W ;
Faherty, BK ;
Gygi, SP .
NATURE METHODS, 2005, 2 (09) :667-675
[9]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[10]  
Hahn G.J., 1991, Statistical Intervals: A Guide for Practitioners and Researchers, VFirst, DOI 10.1002/