Bias in the estimation of false discovery rate in microarray studies

被引:58
作者
Pawitan, Y [1 ]
Murthy, KRK
Michiels, S
Ploner, A
机构
[1] Karolinska Inst, Dept Med Epidemiol & Biostat, S-17177 Stockholm, Sweden
[2] Genome Inst Singapore, Singapore, Singapore
[3] Inst Gustave Roussy, Unit Biostat & Epidemiol, Villejuif, France
关键词
D O I
10.1093/bioinformatics/bti626
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Motivation: The false discovery rate (FDR) provides a key statistical assessment for microarray studies. Its value depends on the proportion pi(0) of non-differentially expressed (non-DE) genes. In most microarray studies, many genes have small effects not easily separable from non-DE genes. As a result, current methods often overestimate pi(0) and FDR, leading to unnecessary loss of power in the overall analysis. Methods: For the common two-sample comparison we derive a natural mixture model of the test statistic and an explicit bias formula in the standard estimation of pi(0). We suggest an improved estimation of pi(0) based on the mixture model and describe a practical likelihood-based procedure for this purpose. Results: The analysis shows that a large bias occurs when pi(0) is far from 1 and when the non-centrality parameters of the distribution of the test statistic are near zero. The theoretical result also explains substantial discrepancies between non-parametric and model-based estimates of pi(0). Simulation studies indicate mixture-model estimates are less biased than standard estimates. The method is applied to breast cancer and lymphoma data examples.
引用
收藏
页码:3865 / 3872
页数:8
相关论文
共 15 条
[1]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[2]
A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments [J].
Broët, P ;
Lewin, A ;
Richardson, S ;
Dalmasso, C ;
Magdelenat, H .
BIOINFORMATICS, 2004, 20 (16) :2562-2571
[3]
A simple procedure for estimating the false discovery rate [J].
Dalmasso, C ;
Broët, P ;
Moreau, T .
BIOINFORMATICS, 2005, 21 (05) :660-668
[4]
Empirical Bayes screening of many p-values with applications to microarray studies [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2005, 21 (09) :1987-1994
[5]
Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].
Efron, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104
[6]
Empirical Bayes analysis of a microarray experiment [J].
Efron, B ;
Tibshirani, R ;
Storey, JD ;
Tusher, V .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1151-1160
[7]
A practical false discovery rate approach to identifying patterns of differential expression in microarray data [J].
Grant, GR ;
Liu, JM ;
Stoeckert, CJ .
BIOINFORMATICS, 2005, 21 (11) :2684-2690
[8]
Gene-expression profiles in hereditary breast cancer. [J].
Hedenfalk, I ;
Duggan, D ;
Chen, YD ;
Radmacher, M ;
Bittner, M ;
Simon, R ;
Meltzer, P ;
Gusterson, B ;
Esteller, M ;
Kallioniemi, OP ;
Wilfond, B ;
Borg, Å ;
Trent, J ;
Raffeld, M ;
Yakhini, Z ;
Ben-Dor, A ;
Dougherty, E ;
Kononen, J ;
Bubendorf, L ;
Fehrle, W ;
Pittaluga, S ;
Gruvberger, S ;
Loman, N ;
Johannsoson, O ;
Olsson, H ;
Sauter, G .
NEW ENGLAND JOURNAL OF MEDICINE, 2001, 344 (08) :539-548
[9]
False discovery rate, sensitivity and sample size for microarray studies [J].
Pawitan, Y ;
Michiels, S ;
Koscielny, S ;
Gusnanto, A ;
Ploner, A .
BIOINFORMATICS, 2005, 21 (13) :3017-3024
[10]
Pawitan Y., 2001, all Likelihood: Statistical Modelling and Inference Using Likelihood