Capturing heterogeneity in gene expression studies by surrogate variable analysis

被引:1261
作者
Leek, Jeffrey T.
Storey, John D. [1 ]
机构
[1] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
来源
PLOS GENETICS | 2007年 / 3卷 / 09期
关键词
D O I
10.1371/journal.pgen.0030161
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce ``surrogate variable analysis'' (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.
引用
收藏
页码:1724 / 1735
页数:12
相关论文
共 44 条
[31]  
Qiu Xing, 2006, Journal of Bioinformatics and Computational Biology, V4, P1057, DOI 10.1142/S0219720006002338
[32]  
QUI X, 2005, STAT APPL GENET MOL, V4
[33]  
R Development Core Team, 2004, R LANG ENV STAT COMP
[34]   Integrative analysis of the cancer transcriptome [J].
Rhodes, DR ;
Chinnaiyan, AM .
NATURE GENETICS, 2005, 37 (Suppl 6) :S31-S37
[35]  
Rice J., 1995, MATH STAT DATA ANAL
[36]   A transcriptional profile of aging in the human kidney [J].
Rodwell, GEJ ;
Sonu, R ;
Zahn, JM ;
Lund, J ;
Wilhelmy, J ;
Wang, LL ;
Xiao, WZ ;
Mindrinos, M ;
Crane, E ;
Segal, E ;
Myers, BD ;
Brooks, JD ;
Davis, RW ;
Higgins, J ;
Owen, AB ;
Kim, SK .
PLOS BIOLOGY, 2004, 2 (12) :2191-2201
[37]   Genetics of gene expression surveyed in maize, mouse and man [J].
Schadt, EE ;
Monks, SA ;
Drake, TA ;
Lusis, AJ ;
Che, N ;
Colinayo, V ;
Ruff, TG ;
Milligan, SB ;
Lamb, JR ;
Cavet, G ;
Linsley, PS ;
Mao, M ;
Stoughton, RB ;
Friend, SH .
NATURE, 2003, 422 (6929) :297-302
[38]   Multiple locus linkage analysis of genomewide expression in yeast [J].
Storey, JD ;
Akey, JM ;
Kruglyak, L .
PLOS BIOLOGY, 2005, 3 (08) :1380-1390
[39]   Significance analysis of time course microarray experiments [J].
Storey, JD ;
Xiao, WZ ;
Leek, JT ;
Tompkins, RG ;
Davis, RW .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (36) :12837-12842
[40]   A direct approach to false discovery rates [J].
Storey, JD .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 :479-498