Peak intensity prediction in MALDI-TOF mass spectrometry: A machine learning study to support quantitative proteomics

被引:28
作者
Timm, Wiebke [1 ,4 ]
Scherbart, Alexandra [1 ]
Boecker, Sebastian [2 ]
Kohlbacher, Oliver [3 ]
Nattkemper, Tim W. [1 ]
机构
[1] Univ Bielefeld, Appl Neuroinformat Grp, D-4800 Bielefeld, Germany
[2] Univ Jena, Jena, Germany
[3] Univ Tubingen, Ctr Bioinformat Tubingen, Tubingen, Germany
[4] Univ Bielefeld, Intl NRW Grad Sch Bioinformat & Genome Res, D-4800 Bielefeld, Germany
关键词
D O I
10.1186/1471-2105-9-443
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e. g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification. Results: In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation). Conclusion: The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.
引用
收藏
页数:18
相关论文
共 47 条
[1]   Comparative LC-MS: A landscape of peaks and valleys [J].
America, Antoine H. P. ;
Cordewener, Jan H. G. .
PROTEOMICS, 2008, 8 (04) :731-749
[2]   Quantifying reproducibility for differential proteomics: noise analysis for protein liquid chromatography-mass spectrometry of human serum [J].
Anderle, M ;
Roy, S ;
Lin, H ;
Becker, C ;
Joho, K .
BIOINFORMATICS, 2004, 20 (18) :3575-3582
[3]  
[Anonymous], 1999, ADV NEURAL INFORM PR
[4]  
BANTSCHEFF M, 2007, ANAL BIOANAL CHEM
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Breiman L., 2002, Manual on setting up, using, and understanding random forests
[7]   Quantitation of SR 27417 in human plasma using electrospray liquid chromatography tandem mass spectrometry: A study of ion suppression [J].
Buhrman, DL ;
Price, PI ;
Rudewicz, PJ .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1996, 7 (11) :1099-1105
[8]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[9]  
CHAMBERS JM, 1992, STAT MODELS S LINEAR, V4
[10]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482