Machine learning approaches to lung cancer prediction from mass spectra

被引:34
作者
Hilario, M
Kalousis, A
Müller, M
Pellegrini, C
机构
[1] Univ Geneva, CUI, Dept Comp Sci, CH-1211 Geneva, Switzerland
[2] Swiss Inst Bioinformat, Geneva, Switzerland
关键词
classification; diagnosis; lung cancer; mass spectra; variable selection;
D O I
10.1002/pmic.200300523
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We addressed the problem of discriminating between 24 diseased and 17 healthy specimens on the basis of protein mass spectra. To prepare the data, we performed mass to charge ratio m/z) normalization, baseline elimination, and conversion of absolute peak height measures to height ratios. After preprocessing, the major difficulty encountered was the extremely large number of variables (1676 m/z values) versus the number of examples (41). Dimensionality reduction was treated as an integral part of the classification process; variable selection was coupled with model construction in a single ten-fold cross-validation loop. We explored different experimental setups involving two peak height representations, two variable selection methods, and six induction algorithms, all on both the original 1676-mass data set and on a prescreened 124-mass data set. Highest predictive accuracies (1-2 off-sample misclassifications) were achieved by a multilayer perceptron and Naive Bayes, with the latter displaying more consistent performance (hence greater reliability) over varying experimental conditions. We attempted to identify the most discriminant peaks (proteins) on the basis of scores assigned by the two variable selection methods and by neural network based sensitivity analysis. These three scoring schemes consistently ranked four peaks as the most relevant discriminators.
引用
收藏
页码:1716 / 1719
页数:4
相关论文
共 7 条
[1]  
[Anonymous], NEURAL COMPUTATION
[2]  
[Anonymous], 1993, P 13 INT JOINT C ART
[3]  
Cohen W. W., 1995, P 12 INT C MACH LEAR, P115, DOI DOI 10.1016/B978-1-55860-377-6.50023-2
[4]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[5]  
GAMA J, 1999, INTELLIGENT DATA ANA, V2, P1
[6]  
Kononenko I, 1994, EUR C MACH LEARN, P171, DOI 10.1007/3-540-57868-4_57
[7]  
Quinlan J. R., 1986, Machine Learning, V1, P81, DOI 10.1023/A:1022643204877