Capturing heterogeneity in gene expression studies by surrogate variable analysis

被引:1261
作者
Leek, Jeffrey T.
Storey, John D. [1 ]
机构
[1] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
来源
PLOS GENETICS | 2007年 / 3卷 / 09期
关键词
D O I
10.1371/journal.pgen.0030161
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce ``surrogate variable analysis'' (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.
引用
收藏
页码:1724 / 1735
页数:12
相关论文
共 44 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]   Fluorescent cDNA microarray hybridization reveals complexity and heterogeneity of cellular genotoxic stress responses [J].
Amundson, SA ;
Bittner, M ;
Chen, YD ;
Trent, J ;
Meltzer, P ;
Fornace, AJ .
ONCOGENE, 1999, 18 (24) :3666-3672
[3]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[4]   Genetic interactions between polymorphisms that affect gene expression in yeast [J].
Brem, RB ;
Storey, JD ;
Whittle, J ;
Kruglyak, L .
NATURE, 2005, 436 (7051) :701-703
[5]   Genetic dissection of transcriptional regulation in budding yeast [J].
Brem, RB ;
Yvert, G ;
Clinton, R ;
Kruglyak, L .
SCIENCE, 2002, 296 (5568) :752-755
[6]   REMARKS ON PARALLEL ANALYSIS [J].
BUJA, A ;
EYUBOGLU, N .
MULTIVARIATE BEHAVIORAL RESEARCH, 1992, 27 (04) :509-540
[7]   Modified Simes' critical values under positive dependence [J].
Cai, Gengqian ;
Sarkar, Sanat K. .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2006, 136 (12) :4129-4146
[8]   A new approach to intensity-dependent normalization of two-channel microarrays [J].
Dabney, Alan R. ;
Storey, John D. .
BIOSTATISTICS, 2007, 8 (01) :128-139
[9]   A reanalysis of a published Affymetrix GeneChip control dataset [J].
Dabney, AR ;
Storey, JD .
GENOME BIOLOGY, 2006, 7 (03)
[10]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686