False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies

被引:8
作者
Ahmed, Ismail [1 ]
Hartikainen, Anna-Liisa [2 ]
Jarvelin, Marjo-Riitta [3 ]
Richardson, Sylvia [3 ]
机构
[1] INSERM, F-75654 Paris 13, France
[2] Univ Oulu, Oulu, Finland
[3] Univ London Imperial Coll Sci Technol & Med, London SW7 2AZ, England
基金
英国惠康基金; 英国医学研究理事会; 芬兰科学院;
关键词
Stability Selection; GWAS; false discovery rate; VARIABLE SELECTION; MULTIPLE; VARIANTS;
D O I
10.2202/1544-6115.1663
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
070307 [化学生物学]; 071010 [生物化学与分子生物学];
摘要
Stability Selection, which combines penalized regression with subsampling, is a promising algorithm to perform variable selection in ultra high dimension. This work is motivated by its evaluation in the context of genome-wide association studies (GWAS). One critical aspect for its use lies in the choice of a decision rule that accounts for the massive number of comparisons realised. The current decision rule relies on the control of the Family Wise Error Rate (FWER) by means of an upper bound derived theoretically. Alternatively, we propose to set the detection threshold according to the more liberal false discovery rate (FDR) criterion. The procedure we propose for its estimation relies on permutations. This procedure is evaluated by simulations according to several scenarios mimicking various correlation structures of genetic data and is compared to the original FWER upper bound. The proposed procedure is shown to be less conservative, and able to pick up more true signals than the FWER upper bound. Finally, the proposed methodology is illustrated on a GWAS analysis of a lipid phenotype (high-density lipoproteins, HDL) in the Northern Finland Birth Cohort.
引用
收藏
页数:21
相关论文
共 20 条
[1]
[Anonymous], 2010, R LANG ENV STAT COMP
[2]
[Anonymous], STAT APPL GENET MOL
[3]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]
Joint Identification of Multiple Genetic Variants via Elastic-Net Variable Selection in a Genome-Wide Association Analysis [J].
Cho, Seoae ;
Kim, Kyunga ;
Kim, Young Jin ;
Lee, Jong-Keuk ;
Cho, Yoon Shin ;
Lee, Jong-Young ;
Han, Bok-Ghee ;
Kim, Heebal ;
Ott, Jurg ;
Park, Taesung .
ANNALS OF HUMAN GENETICS, 2010, 74 :416-428
[5]
Multiple hypothesis testing in microarray experiments [J].
Dudoit, S ;
Shaffer, JP ;
Boldrick, JC .
STATISTICAL SCIENCE, 2003, 18 (01) :71-103
[6]
Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[7]
Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883
[8]
A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion [J].
Farcomeni, Alessio .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2008, 17 (04) :347-388
[9]
Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[10]
GUAN Y, 2011, ANN APPL ST IN PRESS