Testing the additional predictive value of high-dimensional molecular data

被引:28
作者
Boulesteix, Anne-Laure [1 ,2 ]
Hothorn, Torsten [2 ]
机构
[1] Univ Munich, Dept Med Informat Biometry & Epidemiol, D-81377 Munich, Germany
[2] Univ Munich, Dept Stat, D-80539 Munich, Germany
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
BREAST-CANCER PROGNOSIS; CLASSIFICATION; ASSOCIATION; OPTIMISM;
D O I
10.1186/1471-2105-11-78
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: While high-dimensional molecular data such as microarray gene expression data have been used for disease outcome prediction or diagnosis purposes for about ten years in biomedical research, the question of the additional predictive value of such data given that classical predictors are already available has long been under-considered in the bioinformatics literature. Results: We suggest an intuitive permutation-based testing procedure for assessing the additional predictive value of high-dimensional molecular data. Our method combines two well-known statistical tools: logistic regression and boosting regression. We give clear advice for the choice of the only method parameter (the number of boosting iterations). In simulations, our novel approach is found to have very good power in different settings, e. g. few strong predictors or many weak predictors. For illustrative purpose, it is applied to the two publicly available cancer data sets. Conclusions: Our simple and computationally efficient approach can be used to globally assess the additional predictive power of a large number of candidate predictors given that a few clinical covariates or a known prognostic index are already available. It is implemented in the R package "globalboosttest" which is publicly available from R-forge and will be sent to the CRAN as soon as possible.
引用
收藏
页数:11
相关论文
共 23 条
[1]  
[Anonymous], 2002, Statistical Applications in Genetics and Molecular Biology, DOI DOI 10.2202/1544-6115.1000
[2]  
[Anonymous], 1996, ICML
[3]   Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models [J].
Binder, Harald ;
Schumacher, Martin .
BMC BIOINFORMATICS, 2008, 9 (1)
[4]   Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value [J].
Boulesteix, Anne-Laure ;
Porzelius, Christine ;
Daumer, Martin .
BIOINFORMATICS, 2008, 24 (15) :1698-1706
[5]   Over-optimism in bioinformatics research [J].
Boulesteix, Anne-Laure .
BIOINFORMATICS, 2010, 26 (03) :437-439
[6]   Boosting algorithms: Regularization, prediction and model fitting [J].
Buehlmann, Peter ;
Hothorn, Torsten .
STATISTICAL SCIENCE, 2007, 22 (04) :477-505
[7]   Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival [J].
Chiaretti, S ;
Li, XC ;
Gentleman, R ;
Vitale, A ;
Vignetti, M ;
Mandelli, F ;
Ritz, J ;
Foa, R .
BLOOD, 2004, 103 (07) :2771-2778
[8]   Good Old clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers [J].
Edén, P ;
Ritz, C ;
Rose, C ;
Fernö, M ;
Peterson, C .
EUROPEAN JOURNAL OF CANCER, 2004, 40 (12) :1837-1841
[9]  
FRIDLYAND J, 2004, WORKSH ADV MICR DAT
[10]  
Friedman J, 2000, ANN STAT, V28, P400