Empirical Bayes analysis of variance component models for microarray data

被引:5
作者
Feng, S. [1 ]
Wolfinger, R. D.
Chu, T. M.
Gibson, G. C.
McGraw, L. A.
机构
[1] Duke Univ, Dept Biostat & Bioinformat, Durham, NC 27705 USA
[2] SAS Inst Inc, Cary, NC 27513 USA
[3] N Carolina State Univ, Dept Genet, Raleigh, NC 27695 USA
[4] Cornell Univ, Dept Genet & Dev, Ithaca, NY 14853 USA
关键词
microarray data analysis; ROC curves; shrinkage estimators;
D O I
10.1198/108571106X110676
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A gene-by-gene mixed model analysis is a useful statistical method for assessing significance for microarray gene differential expression. While a large amount of data on thousands of genes are collected in a microarray experiment, the sample size for each gene is usually small, which could limit the statistical power of this analysis. In this report, we introduce an empirical Bayes (EB) approach for general variance component models applied to microarray data. Within a linear mixed model framework, the restricted maximum likelihood (REML) estimates of variance components of each gene are adjusted by integrating information on variance components estimated from all genes. The approach starts with a series of single-gene analyses. The estimated variance components from each gene are transformed to the "ANOVA components." This transformation makes it possible to independently estimate the marginal distribution of each "ANOVA component." The modes of the posterior distributions are estimated and inversely transformed to compute the posterior estimates of the variance components. The EB statistic is constructed by replacing the REML variance estimates with the EB variance estimates in the usual t statistic. The EB approach is illustrated with a real data example which compares the effects of five different genotypes of male flies on post-mating gene expression in female flies. In a simulation study, the ROC curves are applied to compare the EB statistic and two other statistics. The EB statistic was found to be the most powerful of the three. Though the null distribution of the EB statistic is unknown, a t distribution may be used to provide conservative control of the false positive rate.
引用
收藏
页码:197 / 209
页数:13
相关论文
共 22 条
[1]  
[Anonymous], 2003, Statistical Analysis of Gene Expression Microarray Data. Interdisciplinary Statistics
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]  
BOX GEP, 1973, BAYESIAN INFERENCE
[4]   Statistical methods for ranking differentially expressed genes [J].
Broberg, P .
GENOME BIOLOGY, 2003, 4 (06)
[5]   Carbohydrate-induced differential gene expression patterns in the hyperthermophilic bacterium Thermotoga maritima [J].
Chhabra, SR ;
Shockley, KR ;
Conners, SB ;
Scott, KL ;
Wolfinger, RD ;
Kelly, RM .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2003, 278 (09) :7540-7552
[6]   Comparison of Li-Wong and loglinear mixed models for the statistical analysis of oligonucleotide arrays [J].
Chu, TM ;
Weir, BS ;
Wolfinger, RD .
BIOINFORMATICS, 2004, 20 (04) :500-506
[7]   A systematic statistical linear modeling approach to oligonucleotide array experiments [J].
Chu, TM ;
Weir, B ;
Wolfinger, R .
MATHEMATICAL BIOSCIENCES, 2002, 176 (01) :35-51
[8]  
Efron B., 2000, MICROARRAYS THEIR US
[9]  
Hochberg Y, 2000, HANDB STAT, V18, P75, DOI 10.1016/S0169-7161(00)18006-X
[10]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264