Multiple hypothesis testing by clustering treatment effects

被引:34
作者
Dahl, David B. [1 ]
Newton, Michael A.
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[3] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53706 USA
关键词
Bayesian nonparametrics; conjugate Dirichlet process mixture model; correlated hypothesis test; DNA microarray; gene expression; model-based clustering;
D O I
10.1198/016214507000000211
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Multiple hypothesis testing and clustering have been the subject of extensive research in high-dimensional inference, yet these problems usually have been treated separately. By defining true clusters in terms of shared parameter values, we could improve the sensitivity of individual tests, because more data bearing on the same parameter values are available. We develop and evaluate a hybrid methodology that uses clustering information to increase testing sensitivity and accommodates uncertainty in the true clustering. To investigate the potential efficacy of the hybrid approach, we first study a stylized example in which each object is evaluated with a standard z score but different objects are connected by shared parameter values. We show that there is increased testing power when the clustering is estimated sufficiently well. We next develop a model-based analysis using a conjugate Dirichlet process mixture model. The method is' general, but for specificity we focus attention on microarray gene expression data, to which both clustering and multiple testing methods are actively applied. Clusters provide the means for sharing information among genes, and the hybrid methodology averages over uncertainty in these clusters through Markov chain sampling. Simulations show that the hybrid method performs substantially better than other methods when clustering is heavy or moderate and performs well even under weak clustering. The proposed method is illustrated on microarray data from a study of the effects of aging on gene expression in heart tissue.
引用
收藏
页码:517 / 526
页数:10
相关论文
共 42 条