A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations

被引:161
作者
Agier, Lydiane [1 ]
Portengen, Lutzen [2 ]
Chadeau-Hyam, Marc [3 ]
Basagana, Xavier [4 ,5 ,6 ]
Giorgis-Allemand, Lise [1 ]
Siroux, Valerie [1 ]
Robinson, Oliver [4 ,5 ,6 ]
Vlaanderen, Jelle [2 ]
Gonzalez, Juan R. [4 ,5 ,6 ]
Nieuwenhuijsen, Mark J. [4 ,5 ,6 ]
Vineis, Paolo [3 ]
Vrijheid, Martine [4 ,5 ,6 ]
Slama, Remy [1 ]
Vermeulen, Roel [2 ,3 ]
机构
[1] Univ Grenoble Alpes, IAB, CNRS, Team Environm Epidemiol,Inserm, Grenoble, France
[2] Univ Utrecht, Inst Risk Assessment Sci, Utrecht, Netherlands
[3] Imperial Coll London, MRC PHE, Ctr Environm & Hlth, Dept Epidemiol & Biostat,Sch Publ Hlth, London, England
[4] ISGlobal, Ctr Res Environm Epidemiol CREAL, Barcelona, Spain
[5] Univ Pompeu Fabra, Barcelona, Spain
[6] CIBERESP, Madrid, Spain
关键词
FALSE DISCOVERY RATE; VARIABLE SELECTION; MODELS;
D O I
10.1289/EHP172
中图分类号
X [环境科学、安全科学];
学科分类号
083001 [环境科学];
摘要
Background: The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. Objectives: We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. Methods: In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. Results: On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. Conclusions: Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods.
引用
收藏
页码:1848 / 1856
页数:9
相关论文
共 36 条
[1]
Benjamini Y, 2001, ANN STAT, V29, P1165
[2]
CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]
Bonferroni C., 1936, PUBBLICAZIONI R I SU, V8, P3, DOI DOI 10.4135/9781412961288.N455
[4]
Evolutionary Stochastic Search for Bayesian Model Exploration [J].
Bottolo, Leonard ;
Richardson, Sylvia .
BAYESIAN ANALYSIS, 2010, 5 (03) :583-618
[5]
GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm [J].
Bottolo, Leonardo ;
Chadeau-Hyam, Marc ;
Hastie, David I. ;
Zeller, Tanja ;
Liquet, Benoit ;
Newcombe, Paul ;
Yengo, Loic ;
Wild, Philipp S. ;
Schillert, Arne ;
Ziegler, Andreas ;
Nielsen, Sune F. ;
Butterworth, Adam S. ;
Ho, Weang Kee ;
Castagne, Raphaele ;
Munzel, Thomas ;
Tregouet, David ;
Falchi, Mario ;
Cambien, Francois ;
Nordestgaard, Borge G. ;
Fumeron, Frederic ;
Tybjaerg-Hansen, Anne ;
Froguel, Philippe ;
Danesh, John ;
Petretto, Enrico ;
Blankenberg, Stefan ;
Tiret, Laurence ;
Richardson, Sylvia .
PLOS GENETICS, 2013, 9 (08)
[6]
Deciphering the complex: Methodological overview of statistical models to derive OMICS-based biomarkers [J].
Chadeau-Hyam, Marc ;
Campanella, Gianluca ;
Jombart, Thibaut ;
Bottolo, Leonardo ;
Portengen, Lutzen ;
Vineis, Paolo ;
Liquet, Benoit ;
Vermeulen, Roel C. H. .
ENVIRONMENTAL AND MOLECULAR MUTAGENESIS, 2013, 54 (07) :542-557
[7]
Sparse partial least squares regression for simultaneous dimension reduction and variable selection [J].
Chun, Hyonho ;
Keles, Suenduez .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2010, 72 :3-25
[8]
A REVIEW OF THE EFFECTS OF RANDOM MEASUREMENT ERROR ON RELATIVE RISK ESTIMATES IN EPIDEMIOLOGICAL-STUDIES [J].
DEKLERK, NH ;
ENGLISH, DR ;
ARMSTRONG, BK .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 1989, 18 (03) :705-712
[9]
Modified versions of Bayesian Information Criterion for genome-wide association studies [J].
Frommlet, Florian ;
Ruhaltinger, Felix ;
Twarog, Piotr ;
Bogdan, Malgorzata .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (05) :1038-1051
[10]
HIERARCHICAL REGRESSION FOR EPIDEMIOLOGIC ANALYSES OF MULTIPLE EXPOSURES [J].
GREENLAND, S .
ENVIRONMENTAL HEALTH PERSPECTIVES, 1994, 102 :33-39