Automatic Quality Assessment of Peptide Tandem Mass Spectra

被引:157
作者
Bern, Marshall [1 ]
Goldberg, David [1 ]
McDonald, W. Hayes [2 ]
Yates, John R., III [2 ]
机构
[1] Xerox Corp, Palo Alto Res Ctr, Palo Alto, CA 94304 USA
[2] Scripps Res Inst, La Jolla, CA 92037 USA
关键词
D O I
10.1093/bioinformatics/bth947
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A powerful proteomics methodology couples high-performance liquid chromatography (HPLC) with tandem mass spectrometry and database-search software, such as SEQUEST. Such a set-up, however, produces a large number of spectra, many of which are of too poor quality to be useful. Hence a filter that eliminates poor spectra before the database search can significantly improve throughput and robustness. Moreover, spectra judged to be of high quality, but that cannot be identified by database search, are prime candidates for still more computationally intensive methods, such as de novo sequencing or wider database searches including post-translational modifications. Results: We report on two different approaches to assessing spectral quality prior to identification: binary classification, which predicts whether or not SEQUEST will be able to make an identification, and statistical regression, which predicts a more universal quality metric involving the number of b- and y-ion peaks. The best of our binary classifiers can eliminate over 75% of the unidentifiable spectra while losing only 10% of the identifiable spectra. Statistical regression can pick out spectra of modified peptides that can be identified by a de novo program but not by SEQUEST. In a section of independent interest, we discuss intensity normalization of mass spectra.
引用
收藏
页码:49 / 54
页数:6
相关论文
共 23 条
[1]   Mass spectrometry in proteomics [J].
Aebersold, R ;
Goodlett, DR .
CHEMICAL REVIEWS, 2001, 101 (02) :269-295
[2]  
Bafna V, 2001, Bioinformatics, V17 Suppl 1, pS13
[3]   De novo peptide sequencing via tandem mass spectrometry [J].
Dancík, V ;
Addona, TA ;
Clauser, KR ;
Vath, JE ;
Pevzner, PA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :327-342
[4]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[5]  
Field HI, 2002, PROTEOMICS, V2, P36, DOI 10.1002/1615-9861(200201)2:1<36::AID-PROT36>3.3.CO
[6]  
2-N
[7]  
Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
[8]   Intensity-based statistical scorer for tandem mass spectrometry [J].
Havilio, M ;
Haddad, Y ;
Smilansky, Z .
ANALYTICAL CHEMISTRY, 2003, 75 (03) :435-444
[9]  
Joachims J., 1999, ADV KERNEL METHODS S
[10]  
Keller Andrew, 2002, OMICS A Journal of Integrative Biology, V6, P207, DOI 10.1089/153623102760092805