Predicting survival from microarray data -: a comparative study

被引:222
作者
Bovelstad, H. M. [1 ]
Nygard, S.
Storvold, H. L.
Aldrin, M.
Borgan, O.
Frigessi, A.
Lingjaerde, O. C.
机构
[1] Univ Oslo, Dept Math, Oslo, Norway
[2] Univ Oslo, Dept Informat, Oslo, Norway
[3] Univ Oslo, Norwegian Comp Ctr, Oslo, Norway
[4] Univ Oslo, Inst Basic Med Sci, Dept Biostat, Oslo, Norway
关键词
D O I
10.1093/bioinformatics/btm305
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Survival prediction from gene expression data and other high-dimensional genomic data has been subject to much research during the last years. These kinds of data are associated with the methodological problem of having many more gene expression values than individuals. In addition, the responses are censored survival times. Most of the proposed methods handle this by using Cox's proportional hazards model and obtain parameter estimates by some dimension reduction or parameter shrinkage estimation technique. Using three well-known microarray gene expression data sets, we compare the prediction performance of seven such methods: univariate selection, forward stepwise selection, principal components regression (PCR), supervised principal components regression, partial least squares regression (PLS), ridge regression and the lasso. Results: Statistical learning from subsets should be repeated several times in order to get a fair comparison between methods. Methods using coefficient shrinkage or linear combinations of the gene expression values have much better performance than the simple variable selection methods. For our data sets, ridge regression has the overall best performance. Availability: Matlab and R code for the prediction methods are available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/. Contact: hegembo@math.uio.no
引用
收藏
页码:2080 / 2087
页数:8
相关论文
共 27 条
[1]   Length modified ridge regression [J].
Aldrin, M .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1997, 25 (04) :377-398
[2]  
[Anonymous], 2003, Techniques for censored and truncated data, DOI DOI 10.1007/0-387-21645-6_3
[3]  
[Anonymous], 1989, MULTIVARIATE CALIBRA
[4]   Prediction by supervised principal components [J].
Bair, E ;
Hastie, T ;
Paul, D ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :119-137
[5]   Semi-supervised methods to predict patient survival from gene expression data [J].
Bair, E ;
Tibshirani, R .
PLOS BIOLOGY, 2004, 2 (04) :511-522
[6]   Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival [J].
Chang, HY ;
Nuyten, DSA ;
Sneddon, JB ;
Hastie, T ;
Tibshirani, R ;
Sorlie, T ;
Dai, HY ;
He, YDD ;
van't Veer, LJ ;
Bartelink, H ;
van de Rijn, M ;
Brown, PO ;
van de Vijver, MJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (10) :3738-3743
[7]  
COX DR, 1972, J R STAT SOC B, V34, P187
[8]   A STATISTICAL VIEW OF SOME CHEMOMETRICS REGRESSION TOOLS [J].
FRANK, IE ;
FRIEDMAN, JH .
TECHNOMETRICS, 1993, 35 (02) :109-135
[9]  
Friedman J, 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
[10]   RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS [J].
HOERL, AE ;
KENNARD, RW .
TECHNOMETRICS, 1970, 12 (01) :55-&