Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer

被引:53
作者
Ali, A. M. G. [1 ]
Dawson, S. -J [2 ,3 ]
Blows, F. M. [2 ]
Provenzano, E. [3 ,4 ,5 ]
Ellis, I. O. [6 ]
Baglietto, L. [7 ]
Huntsman, D. [8 ,9 ,10 ]
Caldas, C. [2 ,3 ,4 ,5 ]
Pharoah, P. D. [1 ,2 ]
机构
[1] Univ Cambridge, Dept Publ Hlth & Primary Care, Strangeways Res Lab, Cambridge CB1 8RN, England
[2] Univ Cambridge, Dept Oncol, Cambridge CB1 8RN, England
[3] Canc Res UK Cambridge Res Inst, Cambridge, England
[4] Cambridge Univ Hosp, Addenbrookes Hosp, NHS Fdn Trust, Cambridge Breast Unit, Cambridge, England
[5] NIHR Cambridge Biomed Res Ctr, Cambridge, England
[6] City Hosp Nottingham, Dept Histopathol, Nottingham, England
[7] Canc Epidemiol Ctr, Canc Council Victoria, Carlton, Vic, Australia
[8] Genet Pathol Evaluat Ctr, Dept Pathol, Vancouver, BC, Canada
[9] Vancouver Gen Hosp, British Columbia Canc Agcy, Prostate Res Ctr, Vancouver, BC, Canada
[10] Univ British Columbia, Vancouver, BC V5Z 1M9, Canada
基金
澳大利亚国家健康与医学研究理事会;
关键词
missing data; multiple imputation; complete case analysis; missing covariates; tissue micro-arrays; MULTIPLE IMPUTATION; PREDICTOR VALUES; REGRESSION;
D O I
10.1038/sj.bjc.6606078
中图分类号
R73 [肿瘤学];
学科分类号
100214 [肿瘤学];
摘要
BACKGROUND: Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer. PATIENTS AND METHODS: We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data - complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI-) and multiple imputation with inclusion of the outcome (MI+). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared. RESULTS: Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI+ were least biased and most accurate, whereas estimates for CCA were most biased and least accurate. CONCLUSION: In this study, empirical results from analyses using CCA, MS, MI- and MI+ were similar, although results from CCA were less precise. The results from simulations suggest that in general MI+ is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI+ and CCA should be compared in any multi-variate analysis where missing data are a problem. British Journal of Cancer (2011) 104, 693-699. doi:10.1038/sj.bjc.6606078 www.bjcancer.com Published online 25 January 2011 (C) 2011 Cancer Research UK
引用
收藏
页码:693 / 699
页数:7
相关论文
共 25 条
[1]
A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome [J].
Ambler, Gareth ;
Omar, Rumana Z. ;
Royston, Patrick .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2007, 16 (03) :277-298
[2]
What Improves with Increased Missing Data Imputations? [J].
Bodner, Todd E. .
STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2008, 15 (04) :651-675
[3]
BCL2 in breast cancer: a favourable prognostic marker across molecular subtypes and independent of adjuvant therapy received [J].
Dawson, S-J ;
Makretsov, N. ;
Blows, F. M. ;
Driver, K. E. ;
Provenzano, E. ;
Le Quesne, J. ;
Baglietto, L. ;
Severi, G. ;
Giles, G. G. ;
McLean, C. A. ;
Callagy, G. ;
Green, A. R. ;
Ellis, I. ;
Gelmon, K. ;
Turashvili, G. ;
Leung, S. ;
Aparicio, S. ;
Huntsman, D. ;
Caldas, C. ;
Pharoah, P. .
BRITISH JOURNAL OF CANCER, 2010, 103 (05) :668-675
[4]
Review: A gentle introduction to imputation of missing values [J].
Donders, A. Rogier T. ;
van der Heijden, Geert J. M. G. ;
Stijnen, Theo ;
Moons, Karel G. M. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2006, 59 (10) :1087-1091
[5]
Imputation of missing longitudinal data: a comparison of methods [J].
Engels, JM ;
Diehr, P .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (10) :968-976
[6]
Greene F., 2002, AJCC cancer staging handbook: From the AJCC cancer staging manual, V6th
[7]
A critical look at methods for handling missing covariates in epidemiologic regression analyses [J].
Greenland, S ;
Finkle, WD .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 1995, 142 (12) :1255-1264
[8]
The performance of multiple imputation for missing covariates relative to complete case analysis [J].
Horton, Nicholas J. ;
White, Ian R. ;
Carpenter, James .
STATISTICS IN MEDICINE, 2010, 29 (12) :1357-1357
[9]
TREATMENT OF MISSING DATA IN MULTIVARIATE-ANALYSIS [J].
KIM, JO ;
CURRY, J .
SOCIOLOGICAL METHODS & RESEARCH, 1977, 6 (02) :215-240
[10]
LITTLE R. J., 2019, Statistical analysis with missing data, V793