Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value

被引:64
作者
Boulesteix, Anne-Laure [1 ,2 ]
Porzelius, Christine [1 ,3 ]
Daumer, Martin [1 ]
机构
[1] Sylvia Lawry Ctr MS Res, D-81677 Munich, Germany
[2] Univ Munich, Dept Stat, D-80539 Munich, Germany
[3] Univ Hosp Freiburg, Inst Med Biometry & Med Informat, D-79104 Freiberg, Germany
关键词
D O I
10.1093/bioinformatics/btn262
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In the context of clinical bioinformatics methods are needed for assessing the additional predictive value of microarray data compared to simple clinical parameters alone. Such methods should also provide an optimal prediction rule making use of all potentialities of both types of data: they should ideally be able to catch subtypes which are not identified by clinical parameters alone. Moreover, they should address the question of the additional predictive value of microarray data in a fair framework. Results: We propose a novel but simple two-step approach based on random forests and partial least squares (PLS) dimension reduction embedding the idea of pre-validation suggested by Tibshirani and colleagues, which is based on an internal cross-validation for avoiding overfitting. Our approach is fast, flexible and can be used both for assessing the overall additional significance of the microarray data and for building optimal hybrid classification rules. Its efficiency is demonstrated through simulations and an application to breast cancer and colorectal cancer data.
引用
收藏
页码:1698 / 1706
页数:9
相关论文
共 39 条
[1]  
[Anonymous], 1966, Multivariate Analysis
[2]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[3]   Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models [J].
Binder, Harald ;
Schumacher, Martin .
BMC BIOINFORMATICS, 2008, 9 (1)
[4]   Gene expression profile in multiple sclerosis patients and healthy controls:: identifying pathways relevant to disease [J].
Bomprezzi, R ;
Ringnér, M ;
Kim, S ;
Bittner, ML ;
Khan, J ;
Chen, YD ;
Elkahloun, A ;
Yu, AM ;
Bielekova, B ;
Meltzer, PS ;
Martin, R ;
McFarland, HF ;
Trent, JM .
HUMAN MOLECULAR GENETICS, 2003, 12 (17) :2191-2199
[5]  
Boulesteix AL, 2008, CANCER INFORM, V6, P77
[6]  
Boulesteix A.L., 2004, STAT APPL GENET MOL, V3, P33, DOI [10.2202/1544-6115.1075, DOI 10.2202/1544-6115.1075]
[7]   WilcoxCV: an R package for fast variable selection in cross-validation [J].
Boulesteix, Anne-Laure .
BIOINFORMATICS, 2007, 23 (13) :1702-1704
[8]   Partial least squares: a versatile tool for the analysis of high-dimensional genomic data [J].
Boulesteix, Anne-Laure ;
Strimmer, Korbinian .
BRIEFINGS IN BIOINFORMATICS, 2007, 8 (01) :32-44
[9]  
Boulesteix AL, 2006, STAT APPL GENET MOL, V5
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32