Dealing with gene expression missing data

被引:13
作者
Bras, L. P. [1 ]
Menezes, J. C. [1 ]
机构
[1] Univ Tecn Lisboa, Dept Chem & Biol Engn, Ctr Chem & Biol Engn, IST, P-1049001 Lisbon, Portugal
来源
IEE PROCEEDINGS SYSTEMS BIOLOGY | 2006年 / 153卷 / 03期
关键词
D O I
10.1049/ip-syb:20050056
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
Compared evaluation of different methods is presented for estimating missing values in microarray data: weighted K-nearest neighbours imputation (KNNimpute), regression-based methods such as local least squares imputation (LLSimpute) and partial least squares imputation (PLSimpute) and Bayesian principal component analysis (BPCA). The influence in prediction accuracy of some factors, such as methods' parameters, type of data relationships used in the estimation process (i.e. row-wise, column-wise or both), missing rate and pattern and type of experiment [time series (TS), non-time series (NTS) or mixed (MIX) experiments] is elucidated. Improvements based on the iterative use of data (iterative LLS and PLS imputation - ILLSimpute and IPLSimpute), the need to perform initial imputations (modified PLS and Helland PLS imputation - MPLSimpute and HPLSimpute) and the type of relationships employed (KNNarray, LLSarray, HPLSarray and alternating PLS - APLSimpute) are proposed. Overall, it is shown that data set properties (type of experiment, missing rate and pattern) affect the data similarity structure, therefore influencing the methods' performance. LLSimpute and ILLSimpute are preferable in the presence of data with a stronger similarity structure (TS and MIX experiments), whereas PLS-based methods (MPLSimpute, IPLSimpute and APLSimpute) are preferable when estimating NTS missing data.
引用
收藏
页码:105 / 119
页数:15
相关论文
共 35 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]  
[Anonymous], impute: imputation for microarray data
[3]  
[Anonymous], 1991, APPL MULTIVARIATE DA
[4]  
[Anonymous], 1990, SUBSET SELECTION REG, DOI DOI 10.1007/978-1-4899-2939-6
[5]   Dealing with missing data in MSPC: several methods, different interpretations, some examples [J].
Arteaga, F ;
Ferrer, A .
JOURNAL OF CHEMOMETRICS, 2002, 16 (8-10) :408-418
[6]   Gene expression data analysis [J].
Brazma, A ;
Vilo, J .
FEBS LETTERS, 2000, 480 (01) :17-24
[7]   The use and analysis of microarray data [J].
Butte, A .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (12) :951-960
[8]  
Dayal BS, 1997, J CHEMOMETR, V11, P73, DOI 10.1002/(SICI)1099-128X(199701)11:1<73::AID-CEM435>3.0.CO
[9]  
2-#
[10]   Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering [J].
de Brevern, AG ;
Hazout, S ;
Malpertuy, A .
BMC BIOINFORMATICS, 2004, 5 (1)