A Ranking-Based Scoring Function for Peptide-Spectrum Matches

被引:65
作者
Frank, Ari M. [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
MS/MS; scoring; peptide; PSM; de novo; database search; machine learning; ranking; boosting; TANDEM MASS-SPECTROMETRY; INDUCED DISSOCIATION SPECTRA; HIDDEN MARKOV MODEL; PROTEIN IDENTIFICATION; DATABASE SEARCH; GENOME ANNOTATION; SEQUENCE DATABASES; POSTTRANSLATIONAL MODIFICATIONS; PROTEOMICS; ALGORITHM;
D O I
10.1021/pr800678b
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The analysis of the large volume of tandem mass spectrometry (MS/MS) proteomics data that is generated these days relies on automated algorithms that identify peptides from their mass spectra. An essential component of these algorithms is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). In this paper, we present new approach to scoring of PSMs. We argue that since this problem is at its core a ranking task (especially in the case of de novo sequencing), it can be solved effectively using machine learning ranking algorithms. We developed a new discriminative boosting-based approach to scoring. Our scoring models draw upon a large set of diverse feature functions that measure different qualities of PSMs. Our method improves the performance of our de novo sequencing algorithm beyond the current state-of-the-art, and also greatly enhances the performance of database search programs. Furthermore, by increasing the efficiency of tag filtration and improving the sensitivity of PSM scoring, we make it practical to perform large-scale MS/MS analysis, such as proteogenomic search of a six-frame translation of the human genome (in which we achieve a reduction of the running time by a factor of 15 and a 60% increase in the number of identified peptides, compared to the InsPecT database search tool). Our scoring function is incorporated into PepNovo+ which is available for download or can be run online at http://bix.ucsd.edu.
引用
收藏
页码:2241 / 2252
页数:12
相关论文
共 80 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]  
[Anonymous], 2003, Journal of machine learning research
[3]  
Ansong Charles, 2008, Briefings in Functional Genomics & Proteomics, V7, P50, DOI 10.1093/bfgp/eln010
[4]  
Auerbach D, 2002, PROTEOMICS, V2, P611, DOI 10.1002/1615-9861(200206)2:6<611::AID-PROT611>3.0.CO
[5]  
2-Y
[6]  
Bafna V., 2001, BIOINFORMATICS, V17, P13
[7]   FAST ALGORITHM FOR PEPTIDE SEQUENCING BY MASS-SPECTROSCOPY [J].
BARTELS, C .
BIOMEDICAL AND ENVIRONMENTAL MASS SPECTROMETRY, 1990, 19 (06) :363-368
[8]   Lookup peaks: A hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry [J].
Bern, Marshall ;
Cai, Yuhan ;
Goldberg, David .
ANALYTICAL CHEMISTRY, 2007, 79 (04) :1393-1400
[9]   Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data [J].
Cannon, WR ;
Jarman, KH ;
Webb-Robertson, BJM ;
Baxter, DJ ;
Oehmen, CS ;
Jarman, KD ;
Heredia-Langner, A ;
Auberry, KJ ;
Anderson, GA .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (05) :1687-1698
[10]   Discovery and revision of Arabidopsis genes by proteogenomics [J].
Castellana, Natalie E. ;
Payne, Samuel H. ;
Shen, Zhouxin ;
Stanke, Mario ;
Bafna, Vineet ;
Briggs, Steven P. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (52) :21034-21038