Improving cluster-based missing value estimation of DNA microarray data

被引:65
作者
Bras, Ligia P. [1 ]
Menezes, Jose C. [1 ]
机构
[1] Univ Tecn Lisboa, Ctr Chem & Biol Engn, IST, Dept Chem & Biol Engn, P-1049 Lisbon, Portugal
来源
BIOMOLECULAR ENGINEERING | 2007年 / 24卷 / 02期
关键词
missing value estimation; K-nearest neighbours; gene expression data; DNA microarray data;
D O I
10.1016/j.bioeng.2007.04.003
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of lKNNimpme was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of lKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:273 / 282
页数:10
相关论文
共 29 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]  
[Anonymous], [No title captured]
[3]   LSimpute: accurate estimation of missing values in microarray data with least squares methods [J].
Bo, TH ;
Dysvik, J ;
Jonassen, I .
NUCLEIC ACIDS RESEARCH, 2004, 32 (03) :e34
[4]   Dealing with gene expression missing data [J].
Bras, L. P. ;
Menezes, J. C. .
IEE PROCEEDINGS SYSTEMS BIOLOGY, 2006, 153 (03) :105-119
[5]   Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering [J].
de Brevern, AG ;
Hazout, S ;
Malpertuy, A .
BMC BIOINFORMATICS, 2004, 5 (1)
[6]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[7]  
Ihaka R., 1996, Journal of computational and graphical statistics, V5, P299, DOI [10.1080/10618600.1996.10474713, 10.2307/1390807]
[8]   DNA microarray data imputation and significance analysis of differential expression [J].
Jörnsten, R ;
Wang, HY ;
Welsh, WJ ;
Ouyang, M .
BIOINFORMATICS, 2005, 21 (22) :4155-4161
[9]   Analysis of variance for gene expression microarray data [J].
Kerr, MK ;
Martin, M ;
Churchill, GA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (06) :819-837
[10]   Missing value estimation for DNA microarray gene expression data: local least squares imputation [J].
Kim, H ;
Golub, GH ;
Park, H .
BIOINFORMATICS, 2005, 21 (02) :187-198