A systematic comparison of statistical methods to detect interactions in exposome-health associations

被引:57
作者
Barrera-Gomez, Jose [1 ,2 ,3 ]
Agier, Lydiane [4 ,5 ]
Portengen, Lutzen [6 ]
Chadeau-Hyam, Marc [7 ]
Giorgis-Allemand, Lise [4 ,5 ]
Siroux, Valerie [4 ,5 ]
Robinson, Oliver [1 ,2 ,3 ,7 ]
Vlaanderen, Jelle [6 ]
Gonzalez, Juan R. [1 ,2 ,3 ]
Nieuwenhuijsen, Mark [1 ,2 ,3 ]
Vineis, Paolo [8 ]
Vrijheid, Martine [1 ,2 ,3 ]
Vermeulen, Roel [6 ,7 ]
Slama, Remy [4 ,5 ]
Basagana, Xavier [1 ,2 ,3 ]
机构
[1] ISGlobal, Ctr Res Environm Epidemiol CREAL, Dr Aiguader 88, Barcelona 08003, Spain
[2] UPF, Placa Merce 10-12, Barcelona 08002, Spain
[3] CIBERESP, Av Monforte de Lemos 3-5,Pabellon 11 Planta 0, Madrid 28029, Spain
[4] INSERM, Team Environm Epidemiol Appl Reprod & Resp Hlth, Grenoble, France
[5] Univ Grenoble Alpes, Joint Res Ctr, U823, Grenoble, France
[6] Univ Utrecht, Inst Risk Assessment Sci, Utrecht, Netherlands
[7] Imperial Coll London, Sch Publ Hlth, MRC PHE Ctr Environm & Hlth, Dept Epidemiol & Biostat, Norfolk Pl, London W2 1PG, England
[8] Imperial Coll London, Sch Publ Hlth, MRC PHE Ctr Environm & Hlth, London, England
关键词
Exposome; Interactions; Variable selection; REGRESSION; RISK; POLLUTANTS; SELECTION; PROFILE; GENOME;
D O I
10.1186/s12940-017-0277-6
中图分类号
X [环境科学、安全科学];
学科分类号
083001 [环境科学];
摘要
Background: There is growing interest in examining the simultaneous effects of multiple exposures and, more generally, the effects of mixtures of exposures, as part of the exposome concept (being defined as the totality of human environmental exposures from conception onwards). Uncovering such combined effects is challenging owing to the large number of exposures, several of them being highly correlated. We performed a simulation study in an exposome context to compare the performance of several statistical methods that have been proposed to detect statistical interactions. Methods: Simulations were based on an exposome including 237 exposures with a realistic correlation structure. We considered several statistical regression-based methods, including two-step Environment-Wide Association Study (EWAS(2)), the Deletion/Substitution/Addition (DSA) algorithm, the Least Absolute Shrinkage and Selection Operator (LASSO), Group-Lasso INTERaction-NET (GLINTERNET), a three-step method based on regression trees and finally Boosted Regression Trees (BRT). We assessed the performance of each method in terms of model size, predictive ability, sensitivity and false discovery rate. Results: GLINTERNET and DSA had better overall performance than the other methods, with GLINTERNET having better properties in terms of selecting the true predictors (sensitivity) and of predictive ability, while DSA had a lower number of false positives. In terms of ability to capture interaction terms, GLINTERNET and DSA had again the best performances, with the same trade-off between sensitivity and false discovery proportion. When GLINTERNET and DSA failed to select an exposure truly associated with the outcome, they tended to select a highly correlated one. When interactions were not present in the data, using variable selection methods that allowed for interactions had only slight costs in performance compared to methods that only searched for main effects. Conclusions: GLINTERNET and DSA provided better performance in detecting two-way interactions, compared to other existing methods.
引用
收藏
页数:13
相关论文
共 27 条
[1]
A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations [J].
Agier, Lydiane ;
Portengen, Lutzen ;
Chadeau-Hyam, Marc ;
Basagana, Xavier ;
Giorgis-Allemand, Lise ;
Siroux, Valerie ;
Robinson, Oliver ;
Vlaanderen, Jelle ;
Gonzalez, Juan R. ;
Nieuwenhuijsen, Mark J. ;
Vineis, Paolo ;
Vrijheid, Martine ;
Slama, Remy ;
Vermeulen, Roel .
ENVIRONMENTAL HEALTH PERSPECTIVES, 2016, 124 (12) :1848-1856
[2]
[Anonymous], 2009, P 26 ANN INT C MACH
[3]
[Anonymous], 2015, EXPLANATION CAUSAL I
[4]
Benjamini Y, 2001, ANN STAT, V29, P1165
[5]
Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures [J].
Bobb, Jennifer F. ;
Valeri, Linda ;
Claus Henn, Birgit ;
Christiani, David C. ;
Wright, Robert O. ;
Mazumdar, Maitreyi ;
Godleski, John J. ;
Coull, Brent A. .
BIOSTATISTICS, 2015, 16 (03) :493-508
[6]
Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[7]
A working guide to boosted regression trees [J].
Elith, J. ;
Leathwick, J. R. ;
Hastie, T. .
JOURNAL OF ANIMAL ECOLOGY, 2008, 77 (04) :802-813
[8]
Combined Effects of Prenatal Exposures to Environmental Chemicals on Birth Weight [J].
Govarts, Eva ;
Remy, Sylvie ;
Bruckers, Liesbeth ;
Den Hond, Elly ;
Sioen, Isabelle ;
Nelen, Vera ;
Baeyens, Willy ;
Nawrot, Tim S. ;
Loots, Ilse ;
Van Larebeke, Nick ;
Schoeters, Greet .
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2016, 13 (05)
[9]
BASIC PROBLEMS IN INTERACTION ASSESSMENT [J].
GREENLAND, S .
ENVIRONMENTAL HEALTH PERSPECTIVES, 1993, 101 :59-66
[10]
Cohort Profile: The INMA-INfancia y Medio Ambiente-(Environment and Childhood) Project [J].
Guxens, Monica ;
Ballester, Ferran ;
Espada, Mercedes ;
Fernandez, Mariana F. ;
Grimalt, Joan O. ;
Ibarluzea, Jesus ;
Olea, Nicolas ;
Rebagliato, Marisa ;
Tardon, Adonina ;
Torrent, Maties ;
Vioque, Jesus ;
Vrijheid, Martine ;
Sunyer, Jordi .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2012, 41 (04) :930-940