Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation

被引:57
作者
Alexandrov, Theodore [1 ]
Decker, Jens [2 ]
Mertens, Bart [3 ]
Deelder, Andre M. [4 ]
Tollenaar, Rob A. E. M. [5 ]
Maass, Peter [1 ]
Thiele, Herbert [2 ]
机构
[1] Univ Bremen, Ctr Ind Math, D-28334 Bremen, Germany
[2] Bruker Daltonik GmbH, D-28359 Bremen, Germany
[3] Leiden Univ, Med Ctr, Dept Med Stat & Bioinformat, NL-2300 RC Leiden, Netherlands
[4] Leiden Univ, Med Ctr, Dept Parasitol, NL-2300 RC Leiden, Netherlands
[5] Leiden Univ, Med Ctr, Dept Surg, NL-2300 RC Leiden, Netherlands
关键词
CANCER; PROTEOMICS;
D O I
10.1093/bioinformatics/btn662
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Automatic classification of high-resolution mass spectrometry proteomic data has increasing potential in the early diagnosis of cancer. We propose a new procedure of biomarker discovery in serum protein profiles based on: (i) discrete wavelet transformation of the spectra; (ii) selection of discriminative wavelet coefficients by a statistical test and (iii) building and evaluating a support vector machine classifier by double cross-validation with attention to the generalizability of the results. In addition to the evaluation results (total recognition rate, sensitivity and specificity), the procedure provides the biomarker patterns, i. e. the parts of spectra which discriminate cancer and control individuals. The evaluation was performed on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) serum protein profiles of 66 colorectal cancer patients and 50 controls. Results: Our procedure provided a high recognition rate (97.3%), sensitivity (98.4%) and specificity (95.8%). The extracted biomarker patterns mostly represent the peaks expressing mean differences between the cancer and control spectra. However, we showed that the discriminative power of a peak is not simply expressed by its mean height and cannot be derived by comparison of the mean spectra. The obtained classifiers have high generalization power as measured by the number of support vectors. This prevents overfitting and contributes to the reproducibility of the results, which is required to find biomarkers differentiating cancer patients from healthy individuals.
引用
收藏
页码:643 / 649
页数:7
相关论文
共 14 条
[1]  
[Anonymous], 1999, A Wavelet Tour of Signal Processing
[2]  
Bartlett P, 1999, ADVANCES IN KERNEL METHODS, P43
[3]   Proteomics and cancer - Running before we can walk? [J].
Check, E .
NATURE, 2004, 429 (6991) :496-497
[4]   Serum proteomics profiling - a young technology begins to mature [J].
Coombes, KR ;
Morris, JRS ;
Hu, JH ;
Edmonson, SR ;
Baggerly, KA .
NATURE BIOTECHNOLOGY, 2005, 23 (03) :291-292
[5]   Detection of colorectal cancer using MALDI-TOF serum protein profiling [J].
de Noo, ME ;
Mertens, BJA ;
Ozalp, A ;
Bladergroen, MR ;
van der Werff, MPJ ;
van de Velde, CJH ;
Deelder, AM ;
Tollenaar, RAEM .
EUROPEAN JOURNAL OF CANCER, 2006, 42 (08) :1068-1076
[6]   Multiple hypothesis testing in microarray experiments [J].
Dudoit, S ;
Shaffer, JP ;
Boldrick, JC .
STATISTICAL SCIENCE, 2003, 18 (01) :71-103
[7]   A review on applications of wavelet transform techniques in chemical analysis: 1989-1997 [J].
Leung, AKM ;
Chau, FT ;
Gao, JB .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1998, 43 (1-2) :165-184
[8]   Mass spectrometry proteomic diagnosis: Enacting the double cross-validatory paradigm [J].
Mertens, Bart J. A. ;
De Noo, M. E. ;
Tollenaar, R. A. E. M. ;
Deelder, A. M. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (09) :1591-1605
[9]   What is a support vector machine? [J].
Noble, William S. .
NATURE BIOTECHNOLOGY, 2006, 24 (12) :1565-1567
[10]  
Noble WS., 2004, KERNEL METHODS COMPU, V71, P92, DOI [DOI 10.1049/EL:19981092, 10.7551/mitpress/4057.001.0001, DOI 10.7551/MITPRESS/4057.001.0001]