A Ranking-Based Scoring Function for Peptide-Spectrum Matches

被引:67
作者
Frank, Ari M. [1 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
MS/MS; scoring; peptide; PSM; de novo; database search; machine learning; ranking; boosting; TANDEM MASS-SPECTROMETRY; INDUCED DISSOCIATION SPECTRA; HIDDEN MARKOV MODEL; PROTEIN IDENTIFICATION; DATABASE SEARCH; GENOME ANNOTATION; SEQUENCE DATABASES; POSTTRANSLATIONAL MODIFICATIONS; PROTEOMICS; ALGORITHM;
D O I
10.1021/pr800678b
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The analysis of the large volume of tandem mass spectrometry (MS/MS) proteomics data that is generated these days relies on automated algorithms that identify peptides from their mass spectra. An essential component of these algorithms is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). In this paper, we present new approach to scoring of PSMs. We argue that since this problem is at its core a ranking task (especially in the case of de novo sequencing), it can be solved effectively using machine learning ranking algorithms. We developed a new discriminative boosting-based approach to scoring. Our scoring models draw upon a large set of diverse feature functions that measure different qualities of PSMs. Our method improves the performance of our de novo sequencing algorithm beyond the current state-of-the-art, and also greatly enhances the performance of database search programs. Furthermore, by increasing the efficiency of tag filtration and improving the sensitivity of PSM scoring, we make it practical to perform large-scale MS/MS analysis, such as proteogenomic search of a six-frame translation of the human genome (in which we achieve a reduction of the running time by a factor of 15 and a 60% increase in the number of identified peptides, compared to the InsPecT database search tool). Our scoring function is incorporated into PepNovo+ which is available for download or can be run online at http://bix.ucsd.edu.
引用
收藏
页码:2241 / 2252
页数:12
相关论文
共 80 条
[11]   Matching peptide mass spectra to EST and genomic DNA databases [J].
Choudhary, JS ;
Blackstock, WP ;
Creasy, DM ;
Cottrell, JS .
TRENDS IN BIOTECHNOLOGY, 2001, 19 (10) :S17-S22
[12]   Experiments in searching small proteins in unannotated large eukaryotic genomes [J].
Colinge, J ;
Cusin, I ;
Reffas, S ;
Mahé, E ;
Niknejad, A ;
Rey, PA ;
Mattou, H ;
Moniatte, M ;
Bougueleret, L .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (01) :167-174
[13]   High-performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics [J].
Colinge, J ;
Masselot, A ;
Cusin, I ;
Mahé, E ;
Niknejad, A ;
Argoud-Puy, G ;
Reffas, S ;
Bederr, N ;
Gleizes, A ;
Rey, PA ;
Bougueleret, L .
PROTEOMICS, 2004, 4 (07) :1977-1984
[14]   OLAV: Towards high-throughput tandem mass spectrometry data identification [J].
Colinge, J ;
Masselot, A ;
Giron, M ;
Dessingy, T ;
Magnin, J .
PROTEOMICS, 2003, 3 (08) :1454-1463
[15]   Peptide fragment intensity statistical modeling [J].
Colinge, Jacques .
ANALYTICAL CHEMISTRY, 2007, 79 (19) :7286-7290
[16]   The use of proteotypic peptide libraries for protein identification [J].
Craig, R ;
Cortens, JP ;
Beavis, RC .
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2005, 19 (13) :1844-1850
[17]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[18]   De novo peptide sequencing via tandem mass spectrometry [J].
Dancík, V ;
Addona, TA ;
Clauser, KR ;
Vath, JE ;
Pevzner, PA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :327-342
[19]  
Desiere F, 2005, GENOME BIOL, V6
[20]  
Duda R., 1973, Pattern classification and scene analysis, P457