Robust data imputation

被引:29
作者
Branden, Karlien Vanden [2 ]
Verboven, Sabine [1 ]
机构
[1] Univ Antwerp, Dept Math Stat & Actuarial Sci, B-2000 Antwerp, Belgium
[2] Commiss European Communities, Joint Res Ctr, I-21020 Ispra, Italy
关键词
Missing genes; Microarray data; Imputation methods; Robust statistics; PRINCIPAL COMPONENTS; MISSING DATA; CLASSIFICATION; ESTIMATORS; CANCER; PCA;
D O I
10.1016/j.compbiolchem.2008.07.019
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single imputation methods have been wide-discussed topics among researchers in the field of bioinformatics. One major shortcoming of methods proposed until now is the lack of robustness considerations. Like all data, gene expression data can possess outlying values. The presence of these outliers could have negative effects on the imputated values for the missing values. Afterwards, the outcome of any statistical analysis on the completed data could lead to incorrect conclusions. Therefore it is important to consider the possibility of outliers in the data set, and to evaluate how imputation techniques will handle these values. In this paper, a simulation study is performed to test existing techniques for data imputation in case outlying values are present in the clata.Toovercome some shortcomings of the existing imputation techniques, a new robust imputation method that can deal with the presence of outliers in the data is introduced. In addition, the robust imputation procedure cleans the data for further statistical analysis. Moreover, this method can be easily extended towards a Multiple imputation approach by which the uncertainty of the imputed values is emphasised. Finally, a classification example illustrates the lack of robustness of some existing imputation methods and shows the advantage of the multiple imputation approach of the new robust imputation technique. (c) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:7 / 13
页数:7
相关论文
共 27 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   PCA disjoint models for multiclass cancer analysis using gene expression data [J].
Bicciato, S ;
Luchini, A ;
Di Bello, C .
BIOINFORMATICS, 2003, 19 (05) :571-578
[3]  
Bishop CM, 1999, ADV NEUR IN, V11, P382
[4]   High breakdown estimators for principal components: the projection-pursuit approach revisited [J].
Croux, C ;
Ruiz-Gazen, A .
JOURNAL OF MULTIVARIATE ANALYSIS, 2005, 95 (01) :206-226
[5]   Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies [J].
Croux, C ;
Haesbroeck, G .
BIOMETRIKA, 2000, 87 (03) :603-618
[6]  
Donoho D.L., 1982, Technical report
[7]   A simultaneous reconstruction of missing data in DNA microarrays [J].
Friedland, Shmuel ;
Niknejad, Amir ;
Chihara, Laura .
LINEAR ALGEBRA AND ITS APPLICATIONS, 2006, 416 (01) :8-28
[8]   Microarray missing data imputation based on a set theoretic framework and biological knowledge [J].
Gan, XC ;
Liew, AWC ;
Yan, H .
NUCLEIC ACIDS RESEARCH, 2006, 34 (05) :1608-1619
[9]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[10]   Gene-expression profiles in hereditary breast cancer. [J].
Hedenfalk, I ;
Duggan, D ;
Chen, YD ;
Radmacher, M ;
Bittner, M ;
Simon, R ;
Meltzer, P ;
Gusterson, B ;
Esteller, M ;
Kallioniemi, OP ;
Wilfond, B ;
Borg, Å ;
Trent, J ;
Raffeld, M ;
Yakhini, Z ;
Ben-Dor, A ;
Dougherty, E ;
Kononen, J ;
Bubendorf, L ;
Fehrle, W ;
Pittaluga, S ;
Gruvberger, S ;
Loman, N ;
Johannsoson, O ;
Olsson, H ;
Sauter, G .
NEW ENGLAND JOURNAL OF MEDICINE, 2001, 344 (08) :539-548