Calibrating e-values for MS2 database search methods

被引:13
作者
Alves, Gelio [1 ]
Ogurtsov, Aleksey Y. [1 ]
Wu, Wells W. [2 ]
Wang, Guanghui [2 ]
Shen, Rong-Fong
Yu, Yi-Kuo [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
[2] NHLBI, Proteom Core Facil, Bethesda, MD 20892 USA
关键词
D O I
10.1186/1745-6150-2-26
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The key to mass-spectrometry-based proteomics is peptide identification, which relies on software analysis of tandem mass spectra. Although each search engine has its strength, combining the strengths of various search engines is not yet realizable largely due to the lack of a unified statistical framework that is applicable to any method. Results: We have developed a universal scheme for statistical calibration of peptide identifications. The protocol can be used for both de novo approaches as well as database search methods. We demonstrate the protocol using only the database search methods. Among seven methods-SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X!Tandem (v1.0), OMSSA (v2.0) and RAId_DbS-calibrated, except for X! Tandem and RAId_DbS most methods require a rescaling according to the database size searched. We demonstrate that our calibration protocol indeed produces unified statistics both in terms of average number of false positives and in terms of the probability for a peptide hit to be a true positive. Although both the protocols for calibration and the statistics thus calibrated are universal, the calibration formulas obtained from one laboratory with data collected using either centroid or profile format may not be directly usable by the other laboratories. Thus each laboratory is encouraged to calibrate the search methods it intends to use. We also address the importance of using spectrum-specific statistics and possible improvement on the current calibration protocol. The spectra used for statistical (E-value) calibration are freely available upon request. Open peer review: Reviewed by Dongxiao Zhu (nominated by Arcady Mushegian), Alexey Nesvizhskii (nominated by King Jordan) and Vineet Bafna. For the full reviews, please go to the Reviewers' comments section.
引用
收藏
页数:14
相关论文
共 27 条
[1]   Robust accurate identification of peptides (RAId):: deciphering MS2 data using a structured library search with de novo based statistics [J].
Alves, G ;
Yu, YK .
BIOINFORMATICS, 2005, 21 (19) :3726-3732
[2]   RAId_DbS: Peptide identification using database searches with realistic statistics [J].
Alves, Gelio ;
Ogurtsov, Aleksey Y. ;
Yu, Yi-Kuo .
BIOLOGY DIRECT, 2007, 2 (1)
[3]  
Bafna V, 2001, Bioinformatics, V17 Suppl 1, pS13
[4]  
Balde JW, 2003, EME TEC ADV PACK, V1, P1
[5]  
BRADSHAW RA, 2006, MOL CELLULAR PROTEOM, P787
[6]  
CARR SA, 2004, MOL CELL PROTEOMICS, P531
[7]  
CLAUSER KR, 1996, P 44 ASMS C MASS SPE, P365
[8]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[9]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[10]   A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes [J].
Fenyö, D ;
Beavis, RC .
ANALYTICAL CHEMISTRY, 2003, 75 (04) :768-774