Improvements to the Percolator Algorithm for Peptide Identification from Shotgun Proteomics Data Sets

被引:208
作者
Spivak, Marina [2 ,3 ]
Weston, Jason [3 ]
Bottou, Leon [3 ]
Kall, Lukas [1 ,4 ]
Noble, William Stafford [1 ,5 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] NYU, Dept Comp Sci, New York, NY 10003 USA
[3] NEC Labs Amer, Princeton, NJ 08540 USA
[4] Univ Stockholm, Ctr Biomembrane Res, Dept Biochem & Biophys, Stockholm, Sweden
[5] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
shotgun proteomics; tandem mass spectrometry; machine learning; peptide identification; TANDEM MASS-SPECTROMETRY; PROTEIN IDENTIFICATION; SEARCH STRATEGY; VALIDATION; MS/MS; MODEL;
D O I
10.1021/pr801109k
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Shotgun proteomics coupled with database search software allows the identification of a large number of peptides in a single experiment. However, some existing search algorithms, such as SEQUEST, use score functions that are designed primarily to identify the best peptide for a given spectrum. Consequently, when comparing identifications across spectra, the SEQUEST score function Xcorr fails to discriminate accurately between correct and incorrect peptide identifications. Several machine learning methods have been proposed to address the resulting classification task of distinguishing between correct and incorrect peptide-spectrum matches (PSMs). A recent example is Percolator, which uses semisupervised learning and a decoy database search strategy to learn to distinguish between correct and incorrect PSMs identified by a database search algorithm. The current work describes three improvements to Percolator. (1) Percolator's heuristic optimization is replaced with a clear objective function, with intuitive reasons behind its choice. (2) Tractable nonlinear models are used instead of linear models, leading to improved accuracy over the original Percolator. (3) A method, Q-ranker, for directly optimizing the number of identified spectra at a specified q value is proposed, which achieves further gains.
引用
收藏
页码:3737 / 3745
页数:9
相关论文
共 28 条
[1]
A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: Support vector machine classification of peptide MS/MS spectra and SEQUEST scores [J].
Anderson, DC ;
Li, WQ ;
Payan, DG ;
Noble, WS .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (02) :137-146
[2]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]
Accurate and Sensitive Peptide Identification with Mascot Percolator [J].
Brosch, Markus ;
Yu, Lu ;
Hubbard, Tim ;
Choudhary, Jyoti .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (06) :3176-3181
[4]
Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics [J].
Choi, Hyungwon ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :254-265
[5]
Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling [J].
Choi, Hyungwon ;
Ghosh, Debashis ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :286-292
[6]
OLAV: Towards high-throughput tandem mass spectrometry data identification [J].
Colinge, J ;
Masselot, A ;
Giron, M ;
Dessingy, T ;
Magnin, J .
PROTEOMICS, 2003, 3 (08) :1454-1463
[7]
Collobert R, 2006, J MACH LEARN RES, V7, P1687
[8]
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[9]
MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]
Adaptive Discriminant Function Analysis and Reranking of MS/MS Database Search Results for Improved Peptide Identification in Shotgun Proteomics [J].
Ding, Ying ;
Choi, Hyungwon ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (11) :4878-4889