Enhancing peptide identification confidence by combining search methods

被引:52
作者
Alves, Gelio [1 ]
Wu, Wells W. [2 ]
Wang, Guanghui [2 ]
Shen, Rong-Fong [2 ]
Yu, Yi-Kuo [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[2] NHLBI, Prote Core Facil, NIH, Bethesda, MD 20892 USA
关键词
D O I
10.1021/pr700798h
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Confident peptide identification is one of the most important components in mass-spectrometry-based proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProblD (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined E-value. The data related to this study are freely available upon request.
引用
收藏
页码:3102 / 3113
页数:12
相关论文
共 21 条
[1]   Calibrating e-values for MS2 database search methods [J].
Alves, Gelio ;
Ogurtsov, Aleksey Y. ;
Wu, Wells W. ;
Wang, Guanghui ;
Shen, Rong-Fong ;
Yu, Yi-Kuo .
BIOLOGY DIRECT, 2007, 2 (1)
[2]   RAId_DbS: Peptide identification using database searches with realistic statistics [J].
Alves, Gelio ;
Ogurtsov, Aleksey Y. ;
Yu, Yi-Kuo .
BIOLOGY DIRECT, 2007, 2 (1)
[3]   Combining evidence using p-values: application to sequence homology searches [J].
Bailey, TL ;
Gribskov, M .
BIOINFORMATICS, 1998, 14 (01) :48-54
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Comparison of different search engines using validated MS/MS test datasets [J].
Boutilier, K ;
Ross, M ;
Podtelejnikov, AV ;
Orsi, C ;
Taylor, R ;
Taylor, P ;
Figeys, D .
ANALYTICA CHIMICA ACTA, 2005, 534 (01) :11-20
[6]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[7]   ON FISHER METHOD OF COMBINING P-VALUES [J].
ELSTON, RC .
BIOMETRICAL JOURNAL, 1991, 33 (03) :339-345
[8]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[9]  
Finner H, 2002, ANN STAT, V30, P220
[10]  
Fisher RA, 1958, STAT METHODS RES WOR