Using control genes to correct for unwanted variation in microarray data

被引:295
作者
Gagnon-Bartsch, Johann A. [1 ]
Speed, Terence P. [1 ,2 ]
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Walter & Eliza Hall Inst Med Res, Bioinformat Div, Melbourne, Vic 3050, Australia
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Batch effect; Control gene; Differential expression; Factor analysis; SVA; Unwanted variation; QUALITY ASSESSMENT; EXPRESSION; NORMALIZATION; SUMMARIES; VARIANCE; MODEL;
D O I
10.1093/biostatistics/kxr034
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method "Remove Unwanted Variation, 2-step" (RUV-2). We discuss various techniques for assessing the performance of an adjustment method and compare the performance of RUV-2 with that of other commonly used adjustment methods such as Combat and Surrogate Variable Analysis (SVA). We present several example studies, each concerning genes differentially expressed with respect to gender in the brain and find that RUV-2 performs as well or better than other methods. Finally, we discuss the possibility of adapting RUV-2 for use in studies not concerned with differential expression and conclude that there may be promise but substantial challenges remain.
引用
收藏
页码:539 / 552
页数:14
相关论文
共 28 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]  
[Anonymous], 2006, Pattern recognition and machine learning
[3]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[4]  
Brettschneider J, 2008, TECHNOMETRICS, V50, P241, DOI 10.1198/004017008000000334
[5]   Human housekeeping genes are compact [J].
Eisenberg, E ;
Levanon, EY .
TRENDS IN GENETICS, 2003, 19 (07) :362-365
[6]   Quality assessment of affymetrix GeneChip data [J].
Heber, Steffen ;
Sick, Beate .
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2006, 10 (03) :358-368
[7]   ROBPCA: A new approach to robust principal component analysis [J].
Hubert, M ;
Rousseeuw, PJ ;
Vanden Branden, K .
TECHNOMETRICS, 2005, 47 (01) :64-79
[8]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[9]   Summaries of affymetrix GeneChip probe level data [J].
Irizarry, RA ;
Bolstad, BM ;
Collin, F ;
Cope, LM ;
Hobbs, B ;
Speed, TP .
NUCLEIC ACIDS RESEARCH, 2003, 31 (04) :e15
[10]   Adjusting batch effects in microarray expression data using empirical Bayes methods [J].
Johnson, W. Evan ;
Li, Cheng ;
Rabinovic, Ariel .
BIOSTATISTICS, 2007, 8 (01) :118-127