Assessment of evaluation criteria for survival prediction from genomic data

被引:18
作者
Bovelstad, Hege M. [1 ,2 ]
Borgan, Ornulf [1 ]
机构
[1] Univ Oslo, Dept Math, NO-0316 Oslo, Norway
[2] Univ Tromso, Dept Community Med, NO-9037 Tromso, Norway
关键词
AUC; Brier Score; Cox regression; Explained variation; Microarray gene expression data; GENE-EXPRESSION DATA; PARTIAL LEAST-SQUARES; COX REGRESSION; PATIENT SURVIVAL; TIME PREDICTION; MICROARRAY DATA; MODELS; SELECTION;
D O I
10.1002/bimj.201000048
中图分类号
Q [生物科学];
学科分类号
090105 [作物生产系统与生态工程];
摘要
Survival prediction from high-dimensional genomic data is dependent on a proper regularization method. With an increasing number of such methods proposed in the literature, comparative studies are called for and some have been performed. However, there is currently no consensus on which prediction assessment criterion should be used for time-to-event data. Without a firm knowledge about whether the choice of evaluation criterion may affect the conclusions made as to which regularization method performs best, these comparative studies may be of limited value. In this paper, four evaluation criteria are investigated: the log-rank test for two groups, the area under the time-dependent ROC curve (AUC), an R-2-measure based on the Cox partial likelihood, and an R-2-measure based on the Brier score. The criteria are compared according to how they rank six widely used regularization methods that are based on the Cox regression model, namely univariate selection, principal components regression (PCR), supervised PCR, partial least squares regression, ridge regression, and the lasso. Based on our application to three microarray gene expression data sets, we find that the results obtained from the widely used log-rank test deviate from the other three criteria studied. For future studies, where one also might want to include non-likelihood or nonmodel-based regularization methods, we argue in favor of AUC and the R-2-measure based on the Brier score, as these do not suffer from the arbitrary splitting into two groups nor depend on the Cox partial likelihood.
引用
收藏
页码:202 / 216
页数:15
相关论文
共 41 条
[1]
NEAREST-NEIGHBOR ESTIMATION OF A BIVARIATE DISTRIBUTION UNDER RANDOM CENSORING [J].
AKRITAS, MG .
ANNALS OF STATISTICS, 1994, 22 (03) :1299-1327
[2]
Prediction by supervised principal components [J].
Bair, E ;
Hastie, T ;
Paul, D ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :119-137
[3]
Semi-supervised methods to predict patient survival from gene expression data [J].
Bair, E ;
Tibshirani, R .
PLOS BIOLOGY, 2004, 2 (04) :511-522
[4]
Predicting survival from microarray data -: a comparative study [J].
Bovelstad, H. M. ;
Nygard, S. ;
Storvold, H. L. ;
Aldrin, M. ;
Borgan, O. ;
Frigessi, A. ;
Lingjaerde, O. C. .
BIOINFORMATICS, 2007, 23 (16) :2080-2087
[5]
Survival prediction from clinico-genomic models - a comparative study [J].
Bovelstad, Hege M. ;
Nygard, Stale ;
Borgan, Ornulf .
BMC BIOINFORMATICS, 2009, 10
[6]
COX DR, 1972, J R STAT SOC B, V34, P187
[7]
Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO [J].
Datta, Susmita ;
Le-Rademacher, Jennifer ;
Datta, Somnath .
BIOMETRICS, 2007, 63 (01) :259-271
[8]
Graf E, 1999, STAT MED, V18, P2529
[9]
Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data [J].
Gui, J ;
Li, HZ .
BIOINFORMATICS, 2005, 21 (13) :3001-3008
[10]
HANLEY JA, 2005, ENCY BIOSTATISTICS, P4523