A Ranking-Based Scoring Function for Peptide-Spectrum Matches

被引:67
作者
Frank, Ari M. [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
MS/MS; scoring; peptide; PSM; de novo; database search; machine learning; ranking; boosting; TANDEM MASS-SPECTROMETRY; INDUCED DISSOCIATION SPECTRA; HIDDEN MARKOV MODEL; PROTEIN IDENTIFICATION; DATABASE SEARCH; GENOME ANNOTATION; SEQUENCE DATABASES; POSTTRANSLATIONAL MODIFICATIONS; PROTEOMICS; ALGORITHM;
D O I
10.1021/pr800678b
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The analysis of the large volume of tandem mass spectrometry (MS/MS) proteomics data that is generated these days relies on automated algorithms that identify peptides from their mass spectra. An essential component of these algorithms is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). In this paper, we present new approach to scoring of PSMs. We argue that since this problem is at its core a ranking task (especially in the case of de novo sequencing), it can be solved effectively using machine learning ranking algorithms. We developed a new discriminative boosting-based approach to scoring. Our scoring models draw upon a large set of diverse feature functions that measure different qualities of PSMs. Our method improves the performance of our de novo sequencing algorithm beyond the current state-of-the-art, and also greatly enhances the performance of database search programs. Furthermore, by increasing the efficiency of tag filtration and improving the sensitivity of PSM scoring, we make it practical to perform large-scale MS/MS analysis, such as proteogenomic search of a six-frame translation of the human genome (in which we achieve a reduction of the running time by a factor of 15 and a 60% increase in the number of identified peptides, compared to the InsPecT database search tool). Our scoring function is incorporated into PepNovo+ which is available for download or can be run online at http://bix.ucsd.edu.
引用
收藏
页码:2241 / 2252
页数:12
相关论文
共 80 条
[51]   Phosphorylation-specific MS/MS scoring for rapid and accurate phosphoproteome analysis [J].
Payne, Samuel H. ;
Yau, Margaret ;
Smolka, Marcus B. ;
Tanner, Stephen ;
Zhou, Huilin ;
Bafna, Vineet .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (08) :3373-3381
[52]  
Perkins DN, 1999, ELECTROPHORESIS, V20, P3551, DOI 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO
[53]  
2-2
[54]   Performance evaluation of existing de novo sequencing algorithms [J].
Pevtsov, Sergey ;
Fedulova, Irina ;
Mirzaei, Hamid ;
Buck, Charles ;
Zhang, Xiang .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (11) :3018-3028
[55]   THE PERCEPTRON - A PROBABILISTIC MODEL FOR INFORMATION-STORAGE AND ORGANIZATION IN THE BRAIN [J].
ROSENBLATT, F .
PSYCHOLOGICAL REVIEW, 1958, 65 (06) :386-408
[56]   Improved boosting algorithms using confidence-rated predictions [J].
Schapire, RE ;
Singer, Y .
MACHINE LEARNING, 1999, 37 (03) :297-336
[57]   High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results [J].
Searle, BC ;
Dasari, S ;
Turner, M ;
Reddy, AP ;
Choi, DS ;
Wilmarth, PA ;
McCormack, AL ;
David, LL ;
Nagalla, SR .
ANALYTICAL CHEMISTRY, 2004, 76 (08) :2220-2230
[58]   Whole genome searching with shotgun proteomic data: Applications for genome annotation [J].
Sevinsky, Joel R. ;
Cargile, Benjamin J. ;
Bunger, Maureen K. ;
Meng, Fanyu ;
Yates, Nathan A. ;
Hendrickson, Ronald C. ;
Stephenson, James L., Jr. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :80-88
[59]   Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time of flight mass spectrometry and BLAST homology searching [J].
Shevchenko, A ;
Sunyaev, S ;
Loboda, A ;
Shevehenko, A ;
Bork, P ;
Ens, W ;
Standing, KG .
ANALYTICAL CHEMISTRY, 2001, 73 (09) :1917-1926
[60]   The paragon algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra [J].
Shilov, Ignat V. ;
Seymour, Sean L. ;
Patel, Alpesh A. ;
Loboda, Alex ;
Tang, Wilfred H. ;
Keating, Sean P. ;
Hunter, Christie L. ;
Nuwaysir, Lydia M. ;
Schaeffer, Daniel A. .
MOLECULAR & CELLULAR PROTEOMICS, 2007, 6 (09) :1638-1655