Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering

被引:63
作者
Flikka, K
Martens, L
Vandekerckhoe, J
Gevaert, K
Eidhammer, I
机构
[1] Univ Bergen, Berden Ctr Computat Sci, Computat Biol Unit, N-5008 Bergen, Norway
[2] Univ Bergen, Proteom Unit, Bergen, Norway
[3] Univ Bergen, Dept Informat, Bergen, Norway
[4] State Univ Ghent VIB, Dept Med Prot Res, B-9000 Ghent, Belgium
[5] State Univ Ghent VIB, Dept Biochem, B-9000 Ghent, Belgium
关键词
peptide-centric proteomics; protein identification; spectrum quality; tandem mass spectrometry;
D O I
10.1002/pmic.200500309
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In contemporary peptide-centric or non-gel proteome studies, vast amounts of peptide fragmentation data are generated of which only a small part leads to peptide or protein identification. This motivates the development and use of a filtering algorithm that removes spectra that contribute little to protein identification. Removal of unidentifiable spectra reduced both the amount of computational and human time spent on analyzing spectra as well as the chances of obtaining false identifications. Thorough testing on various proteome datasets from different instruments showed that the best suggested machine-learning classifier is, on average, able to recognize half of the unidentified spectra as bad spectra. Further analyses showed that several unidentified spectra classified as good were derived from peptides carrying unanticipated amino acid modifications or contained sequence tags that allowed peptide identification using homology searches. The implementation of the classifiers is available under the GNU General Public License at http://www.bioinfo.no/software/spectrumquality.
引用
收藏
页码:2086 / 2094
页数:9
相关论文
共 33 条
[1]  
[Anonymous], 1993, P 13 INT JOINT C ART
[2]   Improving large-scale proteomics by clustering of mass spectrometry data [J].
Beer, I ;
Barnea, E ;
Ziv, T ;
Admon, A .
PROTEOMICS, 2004, 4 (04) :950-960
[3]   Automatic Quality Assessment of Peptide Tandem Mass Spectra [J].
Bern, Marshall ;
Goldberg, David ;
McDonald, W. Hayes ;
Yates, John R., III .
BIOINFORMATICS, 2004, 20 :49-54
[4]   The need for guidelines in publication of peptide and protein identification data - Working group on publication guidelines for peptide and protein identification data [J].
Carr, S ;
Aebersold, R ;
Baldwin, M ;
Burlingame, A ;
Clauser, K ;
Nesvizhskii, A .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (06) :531-533
[5]   Unimod: Protein modifications for mass spectrometry [J].
Creasy, DM ;
Cottrell, JS .
PROTEOMICS, 2004, 4 (06) :1534-1536
[6]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[7]  
Freund Y, 1999, MACHINE LEARNING, PROCEEDINGS, P124
[8]  
Freund Y, 1996, ICML
[9]   Bayesian network classifiers [J].
Friedman, N ;
Geiger, D ;
Goldszmidt, M .
MACHINE LEARNING, 1997, 29 (2-3) :131-163
[10]   Cyclization of N-terminal S-carbamoylmethylcysteine causing loss of 17 Da from peptides and extra peaks in peptide maps [J].
Geoghegan, KF ;
Hoth, LR ;
Tan, DH ;
Borzillerl, KA ;
Withka, JM ;
Boyd, JG .
JOURNAL OF PROTEOME RESEARCH, 2002, 1 (02) :181-187