Adaptive Discriminant Function Analysis and Reranking of MS/MS Database Search Results for Improved Peptide Identification in Shotgun Proteomics

被引:35
作者
Ding, Ying [1 ,2 ]
Choi, Hyungwon [1 ,2 ]
Nesvizhskii, Alexey I. [1 ,3 ]
机构
[1] Univ Michigan, Dept Pathol, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Ctr Computat Biol & Med, Ann Arbor, MI 48109 USA
关键词
tandem mass spectrometry; database searching; peptide identification; statistical modeling; adaptive discriminant analysis; mass accuracy; decoy sequences;
D O I
10.1021/pr800484x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum data set. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments.
引用
收藏
页码:4878 / 4889
页数:12
相关论文
共 50 条
[21]   Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search [J].
Keller, A ;
Nesvizhskii, AI ;
Kolker, E ;
Aebersold, R .
ANALYTICAL CHEMISTRY, 2002, 74 (20) :5383-5392
[22]   The standard protein mix database: A diverse data set to assist in the production of improved peptide and protein identification software tools [J].
Klimek, John ;
Eddes, James S. ;
Hohmann, Laura ;
Jackson, Jennifer ;
Peterson, Amelia ;
Letarte, Simon ;
Gafken, Philip R. ;
Katz, Jonathan E. ;
Mallick, Parag ;
Lee, Hookeun ;
Schmidt, Alexander ;
Ossola, Reto ;
Eng, Jimmy K. ;
Aebersold, Ruedi ;
Martin, Daniel B. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :96-103
[23]   Statistical model for large-scale peptide identification in databases from tandem mass spectra using SEQUEST [J].
López-Ferrer, D ;
Martinez-Bartolomé, S ;
Villar, M ;
Campillos, M ;
Martín-Maroto, F ;
Vázquez, J .
ANALYTICAL CHEMISTRY, 2004, 76 (23) :6853-6860
[24]   Properties of average score distributions of SEQUEST [J].
Martinez-Bartolome, Salvador ;
Navarro, Pedro ;
Martin-Maroto, Fernando ;
Lopez-Ferrer, Daniel ;
Ramos-Fernandez, Antonio ;
Villar, Margarita ;
Garcia-Ruiz, Josefa P. ;
Vazquez, Jesus .
MOLECULAR & CELLULAR PROTEOMICS, 2008, 7 (06) :1135-1145
[25]   DeconMSn: a software tool for accurate parent ion monoisotopic mass determination for tandem mass spectra [J].
Mayampurath, Anoop M. ;
Jaitly, Navdeep ;
Purvine, Samuel O. ;
Monroe, Matthew E. ;
Auberry, Kenneth J. ;
Adkins, Joshua N. ;
Smith, Richard D. .
BIOINFORMATICS, 2008, 24 (07) :1021-1023
[26]   Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data - Toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides [J].
Nesvizhskii, AI ;
Roos, FF ;
Grossmann, J ;
Vogelzang, M ;
Eddes, JS ;
Gruissem, W ;
Baginsky, S ;
Aebersold, R .
MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (04) :652-670
[27]   Interpretation of shotgun proteomic data - The protein inference problem [J].
Nesvizhskii, AI ;
Aebersold, R .
MOLECULAR & CELLULAR PROTEOMICS, 2005, 4 (10) :1419-1440
[28]   A statistical model for identifying proteins by tandem mass spectrometry [J].
Nesvizhskii, AI ;
Keller, A ;
Kolker, E ;
Aebersold, R .
ANALYTICAL CHEMISTRY, 2003, 75 (17) :4646-4658
[29]   Analysis and validation of proteomic data generated by tandem mass spectrometry [J].
Nesvizhskii, Alexey I. ;
Vitek, Olga ;
Aebersold, Ruedi .
NATURE METHODS, 2007, 4 (10) :787-797
[30]  
Nigam K., 2006, SEMISUPERVISED LEARN