Cross-study validation for the assessment of prediction algorithms

被引:63
作者
Bernau, Christoph [1 ,2 ]
Riester, Markus [3 ,4 ]
Boulesteix, Anne-Laure [2 ]
Parmigiani, Giovanni [3 ,4 ]
Huttenhower, Curtis
Waldron, Levi [5 ]
Trippa, Lorenzo [3 ,4 ]
机构
[1] Leibniz Supercomp Ctr, Garching, Germany
[2] Dept Med Informat Biometry & Epidemiol, Cambridge, MA USA
[3] Dana Farber Canc Inst, Boston, MA 02115 USA
[4] Harvard Univ, Sch Publ Hlth, Boston, MA 02115 USA
[5] CUNY Hunter Coll, Sch Publ Hlth, New York, NY 10021 USA
基金
美国国家科学基金会;
关键词
NEGATIVE BREAST-CANCER; HIGH-DIMENSIONAL DATA; PROGNOSTIC SIGNATURE; ERROR ESTIMATION; CLASSIFIERS; SURVIVAL; MODELS; LUNG; BIOINFORMATICS; METASTASIS;
D O I
10.1093/bioinformatics/btu279
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context. Methods: We develop and implement a systematic approach to 'cross-study validation', to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation.
引用
收藏
页码:105 / 112
页数:8
相关论文
共 42 条
[1]  
[Anonymous], 1993, An introduction to the bootstrap
[2]  
[Anonymous], 2012, EVOLUTION TRANSLATIO
[3]   Development of biomarker classifiers from high-dimensional data [J].
Baek, Songjoon ;
Tsai, Chen-An ;
Chen, James J. .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (05) :537-546
[4]   Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer [J].
Baggerly, Keith A. ;
Coombes, Kevin R. ;
Neeley, E. Shannon .
JOURNAL OF CLINICAL ONCOLOGY, 2008, 26 (07) :1186-1187
[5]   Generating survival times to simulate Cox proportional hazards models [J].
Bender, R ;
Augustin, T ;
Blettner, M .
STATISTICS IN MEDICINE, 2005, 24 (11) :1713-1723
[6]   Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models [J].
Binder, Harald ;
Schumacher, Martin .
BMC BIOINFORMATICS, 2008, 9 (1)
[7]  
Blair E., 2004, PLOS BIOL, V2, P511
[8]   On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by Smith et al. [J].
Boulesteix, Anne-Laure .
BIOINFORMATICS, 2013, 29 (20) :2664-2666
[9]   Predicting survival from microarray data -: a comparative study [J].
Bovelstad, H. M. ;
Nygard, S. ;
Storvold, H. L. ;
Aldrin, M. ;
Borgan, O. ;
Frigessi, A. ;
Lingjaerde, O. C. .
BIOINFORMATICS, 2007, 23 (16) :2080-2087
[10]   An empirical assessment of validation practices for molecular classifiers [J].
Castaldi, Peter J. ;
Dahabreh, Issa J. ;
Ioannidis, John P. A. .
BRIEFINGS IN BIOINFORMATICS, 2011, 12 (03) :189-202