Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments

被引:195
作者
Sartor, Maureen A.
Tomlinson, Craig R.
Wesselkamper, Scott C.
Sivaganesan, Siva
Leikauf, George D.
Medvedovic, Mario [1 ]
机构
[1] Univ Cincinnati, Dept Environm Hlth, Cincinnati, OH USA
[2] Univ Cincinnati, Ctr Environm Genet, Cincinnati, OH USA
[3] Dartmouth Coll, Hitchcock Med Ctr, Dept Med Pharmacol & Toxicol, Lebanon, NH 03756 USA
[4] Univ Cincinnati, Dept Math Sci, Cincinnati, OH 45221 USA
[5] Cincinnati Childrens Hosp Med Ctr, Biomed Informat Div, Cincinnati, OH USA
关键词
D O I
10.1186/1471-2105-7-538
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The small sample sizes often used for microarray experiments result in poor estimates of variance if each gene is considered independently. Yet accurately estimating variability of gene expression measurements in microarray experiments is essential for correctly identifying differentially expressed genes. Several recently developed methods for testing differential expression of genes utilize hierarchical Bayesian models to "pool" information from multiple genes. We have developed a statistical testing procedure that further improves upon current methods by incorporating the well-documented relationship between the absolute gene expression level and the variance of gene expression measurements into the general empirical Bayes framework. Results: We present a novel Bayesian moderated-T, which we show to perform favorably in simulations, with two real, dual-channel microarray experiments and in two controlled single-channel experiments. In simulations, the new method achieved greater power while correctly estimating the true proportion of false positives, and in the analysis of two publicly-available "spike-in" experiments, the new method performed favorably compared to all tested alternatives. We also applied our method to two experimental datasets and discuss the additional biological insights as revealed by our method in contrast to the others. The R-source code for implementing our algorithm is freely available at http://eh3.uc.edu/ibmt. Conclusion: We use a Bayesian hierarchical normal model to define a novel Intensity-Based Moderated T-statistic (IBMT). The method is completely data-dependent using empirical Bayes philosophy to estimate hyperparameters, and thus does not require specification of any free parameters. IBMT has the strength of balancing two important factors in the analysis of microarray data: the degree of independence of variances relative to the degree of identity (i.e. t-tests vs. equal variance assumption), and the relationship between variance and signal intensity. When this variance-intensity relationship is weak or does not exist, IBMT reduces to a previously described moderated t-statistic. Furthermore, our method may be directly applied to any array platform and experimental design. Together, these properties show IBMT to be a valuable option in the analysis of virtually any microarray experiment.
引用
收藏
页数:17
相关论文
共 55 条
[1]   Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[2]   A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes [J].
Baldi, P ;
Long, AD .
BIOINFORMATICS, 2001, 17 (06) :509-519
[3]   THE FGF FAMILY OF GROWTH-FACTORS AND ONCOGENES [J].
BASILICO, C ;
MOSCATELLI, D .
ADVANCES IN CANCER RESEARCH, 1992, 59 :115-165
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Functional components of basic fibroblast growth factor signaling that inhibit lung elastin gene expression [J].
Carreras, I ;
Rich, CB ;
Jaworski, JA ;
Dicamillo, SJ ;
Panchenko, MP ;
Goldstein, R ;
Foster, JA .
AMERICAN JOURNAL OF PHYSIOLOGY-LUNG CELLULAR AND MOLECULAR PHYSIOLOGY, 2001, 281 (04) :L766-L775
[6]   Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset [J].
Choe, SE ;
Boutros, M ;
Michelson, AM ;
Church, GM ;
Halfon, MS .
GENOME BIOLOGY, 2005, 6 (02)
[7]   Polymorphonuclear neutrophil activation during the acute respiratory distress syndrome [J].
Chollet-Martin S. .
Intensive Care Medicine, 2000, 26 (10) :1575-1577
[8]   A benchmark for affymetrix GeneChip expression measures [J].
Cope, LM ;
Irizarry, RA ;
Jaffee, HA ;
Wu, ZJ ;
Speed, TP .
BIOINFORMATICS, 2004, 20 (03) :323-331
[9]   Improved statistical tests for differential gene expression by shrinking variance components estimates [J].
Cui, XG ;
Hwang, JTG ;
Qiu, J ;
Blades, NJ ;
Churchill, GA .
BIOSTATISTICS, 2005, 6 (01) :59-75
[10]   A reanalysis of a published Affymetrix GeneChip control dataset [J].
Dabney, AR ;
Storey, JD .
GENOME BIOLOGY, 2006, 7 (03)