Adjusting batch effects in microarray expression data using empirical Bayes methods

被引:5187
作者
Johnson, W. Evan
Li, Cheng [1 ]
Rabinovic, Ariel
机构
[1] Dana Farber Canc Inst, Dept Biostat & Computat Biol, Boston, MA 02115 USA
[2] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[3] Harvard Univ, Sch Publ Hlth, Dept Genet & Complex Dis, Boston, MA 02115 USA
关键词
batch effects; empirical Bayes; microarrays; Monte Carlo;
D O I
10.1093/biostatistics/kxj037
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Non-biological experimental variation or "batch effects" are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (> 25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.
引用
收藏
页码:118 / 127
页数:10
相关论文
共 20 条
  • [1] Singular value decomposition for genome-wide expression data processing and modeling
    Alter, O
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) : 10101 - 10106
  • [2] Adjustment of systematic microarray data biases
    Benito, M
    Parker, J
    Du, Q
    Wu, JY
    Xang, D
    Perou, CM
    Marron, JS
    [J]. BIOINFORMATICS, 2004, 20 (01) : 105 - 114
  • [3] Chen Y, 1997, J Biomed Opt, V2, P364, DOI 10.1117/12.281504
  • [4] Empirical Bayes analysis of a microarray experiment
    Efron, B
    Tibshirani, R
    Storey, JD
    Tusher, V
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) : 1151 - 1160
  • [5] Effects of atmospheric ozone on microarray data quality
    Fare, TL
    Coffey, EM
    Dai, HY
    He, YDD
    Kessler, DA
    Kilian, KA
    Koch, JE
    LeProust, E
    Marton, MJ
    Meyer, MR
    Stoughton, RB
    Tokiwa, GY
    Wang, YQ
    [J]. ANALYTICAL CHEMISTRY, 2003, 75 (17) : 4672 - 4675
  • [6] Bayesian robust inference for differential gene expression in microarrays with multiple samples
    Gottardo, R
    Raftery, AE
    Yeung, KY
    Bumgarner, RE
    [J]. BIOMETRICS, 2006, 62 (01) : 10 - 18
  • [7] Exploration, normalization, and summaries of high density oligonucleotide array probe level data
    Irizarry, RA
    Hobbs, B
    Collin, F
    Beazer-Barclay, YD
    Antonellis, KJ
    Scherf, U
    Speed, TP
    [J]. BIOSTATISTICS, 2003, 4 (02) : 249 - 264
  • [8] On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles
    Kendziorski, CM
    Newton, MA
    Lan, H
    Gould, MN
    [J]. STATISTICS IN MEDICINE, 2003, 22 (24) : 3899 - 3914
  • [9] Array of hope
    Lander, ES
    [J]. NATURE GENETICS, 1999, 21 (Suppl 1) : 3 - 4
  • [10] Li C, 2003, ANAL GENE EXPRESSION, P120