Missing value estimation for DNA microarray gene expression data: local least squares imputation

被引:322
作者
Kim, H
Golub, GH
Park, H
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[3] Natl Sci Fdn, Arlington, VA 22230 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/bth499
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. Results: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data.
引用
收藏
页码:187 / 198
页数:12
相关论文
共 18 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]   Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (06) :3351-3356
[3]   LSimpute: accurate estimation of missing values in microarray data with least squares methods [J].
Bo, TH ;
Dysvik, J ;
Jonassen, I .
NUCLEIC ACIDS RESEARCH, 2004, 32 (03) :e34
[4]   New gene selection method for classification of cancer subtypes considering within-class variation [J].
Cho, JH ;
Lee, D ;
Park, JY ;
Lee, IB .
FEBS LETTERS, 2003, 551 (1-3) :3-7
[5]  
FRIEDLAND S, 2003, I MATH ITS APPL PREP, V1948
[6]   Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p [J].
Gasch, AP ;
Huang, MX ;
Metzner, S ;
Botstein, D ;
Elledge, SJ ;
Brown, PO .
MOLECULAR BIOLOGY OF THE CELL, 2001, 12 (10) :2987-3003
[7]  
Golub G. H., 1996, MATRIX COMPUTATIONS
[8]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[9]   A Bayesian missing value estimation method for gene expression profile data [J].
Oba, S ;
Sato, M ;
Takemasa, I ;
Monden, M ;
Matsubara, K ;
Ishii, S .
BIOINFORMATICS, 2003, 19 (16) :2088-2096
[10]  
Pearson K., 1894, Philosophical Transactions, V185a, P71, DOI 10.1098/rsta.1894.0003