Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data

被引:23
作者
Cannon, WR [1 ]
Jarman, KH [1 ]
Webb-Robertson, BJM [1 ]
Baxter, DJ [1 ]
Oehmen, CS [1 ]
Jarman, KD [1 ]
Heredia-Langner, A [1 ]
Auberry, KJ [1 ]
Anderson, GA [1 ]
机构
[1] Pacific NW Natl Lab, Richland, WA 99352 USA
关键词
tandem mass spectrometry; peptide identification; fragmentation model; likelihood; hypothesis test; support vector machine;
D O I
10.1021/pr050147v
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We evaluate statistical models used in two-hypothesis tests for identifying peptides from tandem mass spectrometry data. The null hypothesis Ho, that a peptide matches a spectrum by chance, requires information on the probability of by-chance matches between peptide fragments and peaks in the spectrum. Likewise, the alternate hypothesis HA, that the spectrum is due to a particular peptide, requires probabilities that the peptide fragments would indeed be observed if it was the causative agent. We compare models for these probabilities by determining the identification rates produced by the models using an independent data set. The initial models use different probabilities depending on fragment ion type, but uniform probabilities for each ion type across all of the labile bonds along the backbone. More sophisticated models for probabilities under both HA and Ho are introduced that do not assume uniform probabilities for each ion type. In addition, the performance of these models using a standard likelihood model is compared to an information theory approach derived from the likelihood model. Also, a simple but effective model for incorporating peak intensities is described. Finally, a support vector machine is used to discriminate between correct and incorrect identifications based on multiple characteristics of the scoring functions. The results are shown to reduce the misidentification rate significantly when compared to a benchmark cross-correlation based approach.
引用
收藏
页码:1687 / 1698
页数:12
相关论文
共 40 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]   A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: Support vector machine classification of peptide MS/MS spectra and SEQUEST scores [J].
Anderson, DC ;
Li, WQ ;
Payan, DG ;
Noble, WS .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (02) :137-146
[3]  
Bafna V, 2001, Bioinformatics, V17 Suppl 1, pS13
[4]   Cleavage N-terminal to proline: Analysis of a database of peptide tandem mass spectra [J].
Breci, LA ;
Tabb, DL ;
Yates, JR ;
Wysocki, VH .
ANALYTICAL CHEMISTRY, 2003, 75 (09) :1963-1971
[5]   OLAV: Towards high-throughput tandem mass spectrometry data identification [J].
Colinge, J ;
Masselot, A ;
Giron, M ;
Dessingy, T ;
Magnin, J .
PROTEOMICS, 2003, 3 (08) :1454-1463
[6]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
[7]   De novo peptide sequencing via tandem mass spectrometry [J].
Dancík, V ;
Addona, TA ;
Clauser, KR ;
Vath, JE ;
Pevzner, PA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :327-342
[8]   Influence of peptide composition, gas-phase basicity, and chemical modification on fragmentation efficiency: Evidence for the mobile proton model [J].
Dongre, AR ;
Jones, JL ;
Somogyi, A ;
Wysocki, VH .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1996, 118 (35) :8365-8374
[9]   Intensity-based protein identification by machine learning from a library of tandem mass spectra [J].
Elias, JE ;
Gibbons, FD ;
King, OD ;
Roth, FP ;
Gygi, SP .
NATURE BIOTECHNOLOGY, 2004, 22 (02) :214-219
[10]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989