A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: Support vector machine classification of peptide MS/MS spectra and SEQUEST scores

被引:177
作者
Anderson, DC
Li, WQ
Payan, DG
Noble, WS
机构
[1] Rigel Inc, San Francisco, CA 94080 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
关键词
shotgun peptide sequencing; SEQUEST; support vector machine; machine learning; mass spectrometry; capillary LC/MS/MS; proteomics;
D O I
10.1021/pr0255654
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun tandem mass spectrometry-based peptide sequencing using programs such as SEQUEST allows high-throughput identification of peptides, which in turn allows the identification of corresponding proteins. We have applied a machine learning algorithm, called the support vector machine, to discriminate between correctly and incorrectly identified peptides using SEQUEST output. Each peptide was characterized by SEQUEST-calculated features such as delta Cn and Xcorr, measurements such as precursor ion current and mass, and additional calculated parameters such as the fraction of matched MS/MS peaks. The trained SVM classifier performed significantly better than previous cutoff-based methods at separating positive from negative peptides. Positive and negative peptides were more readily distinguished in training set data acquired on a QTOF, compared to an ion trap mass spectrometer. The use of 13 features, including four new parameters, significantly improved the separation between positive and negative peptides. Use of the support vector machine and these additional parameters resulted in a more accurate interpretation of peptide MS/MS spectra and is an important step toward automated interpretation of peptide tandem mass spectrometry data in proteomics.
引用
收藏
页码:137 / 146
页数:10
相关论文
共 46 条
[1]  
Andersen JS, 2002, CURR BIOL, V12, P1, DOI 10.1016/S0960-9822(01)00650-9
[2]   CONTRIBUTIONS OF MASS-SPECTROMETRY TO PEPTIDE AND PROTEIN-STRUCTURE [J].
BIEMANN, K .
BIOMEDICAL AND ENVIRONMENTAL MASS SPECTROMETRY, 1988, 16 (1-12) :99-111
[3]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[4]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[5]   Role of accurate mass measurement (±10 ppm) in protein identification strategies employing MS or MS MS and database searching [J].
Clauser, KR ;
Baker, P ;
Burlingame, AL .
ANALYTICAL CHEMISTRY, 1999, 71 (14) :2871-2882
[6]   IDENTIFICATION OF A PEPTIDE RECOGNIZED BY 5 MELANOMA-SPECIFIC HUMAN CYTOTOXIC T-CELL LINES [J].
COX, AL ;
SKIPPER, J ;
CHEN, Y ;
HENDERSON, RA ;
DARROW, TL ;
SHABANOWITZ, J ;
ENGELHARD, VH ;
HUNT, DF ;
SLINGLUFF, CL .
SCIENCE, 1994, 264 (5159) :716-719
[7]  
Cristianini N, 2000, Intelligent Data Analysis: An Introduction
[8]   Multi-class protein fold recognition using support vector machines and neural networks [J].
Ding, CHQ ;
Dubchak, I .
BIOINFORMATICS, 2001, 17 (04) :349-358
[9]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[10]   Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914