Large-scale regression-based pattern discovery: The example of screening the WHO global drug safety database

被引:50
作者
Caster O. [1 ,2 ]
Norén G.N. [1 ,3 ]
Madigan D. [4 ]
Bate A. [1 ,5 ]
机构
[1] WHO Collaborating Centre for International Drug Monitoring, Uppsala
[2] Department of Computer and Systems Sciences, Stockholm University, Stockholm
[3] Department of Mathematics, Stockholm University, Stockholm
[4] Department of Statistics, Columbia University, New York, NY
[5] Department of Mathematics and Computing, Brunel University, London
来源
Statistical Analysis and Data Mining | 2010年 / 3卷 / 04期
关键词
Adverse drug reaction surveillance; Confounding; Direct and indirect associations; Drug safety; Lasso; Masking; Pharmacovigilance; Shrinkage regression;
D O I
10.1002/sam.10078
中图分类号
学科分类号
摘要
Most measures of interestingness for patterns of co-occurring events are based on data projections onto contingency tables for the events of primary interest. As an alternative, this article presents the first implementation of shrinkage logistic regression for large-scale pattern discovery, with an evaluation of its usefulness in real-world binary transaction data. Regression accounts for the impact of other covariates that may confound or otherwise distort associations. The application considered is international adverse drug reaction (ADR) surveillance, in which large collections of reports on suspected ADRs are screened for interesting reporting patterns worthy of clinical follow-up. Our results show that regression-based pattern discovery does offer practical advantages. Specifically it can eliminate false positives and false negatives due to other covariates. Furthermore, it identifies some established drug safety issues earlier than a measure based on contingency tables. While regression offers clear conceptual advantages, our results suggest that methods based on contingency tables will continue to play a key role in ADR surveillance, for two reasons: the failure of regression to identify some established drug safety concerns as early as the currently used measures, and the relative lack of transparency of the procedure to estimate the regression coefficients. This suggests shrinkage regression should be used in parallel to existing measures of interestingness in ADR surveillance and other large-scale pattern discovery applications. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 197-208, 2010.
引用
收藏
页码:197 / 208
页数:11
相关论文
共 38 条
  • [11] DuMouchel W., Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system, Am Stat, 53, 3, pp. 177-190, (1999)
  • [12] Noren G.N., Edwards I.R., Extending the methods used to screen the WHO drug safety database towards analysis of complex associations and improved accuracy for rare events, Stat Med, 25, 21, pp. 3740-3757, (2006)
  • [13] Hopstadius J., Impact of stratification in adverse drug reaction surveillance, Drug Safety, 31, 11, pp. 1035-1052, (2008)
  • [14] Srikant R., Agrawal R., SIGMOD '96: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Mining quantitative association rules in large relational tables, pp. 1-12, (1996)
  • [15] Aggarwal C.C., Mining associations with the collective strength approach, IEEE Trans Knowl Data Eng, 13, 6, pp. 863-873, (2001)
  • [16] Bate A., De Freitas R.M., A Bayesian neural network method for adverse drug reaction signal generation, Eur J Clin Pharmacol, 54, 4, pp. 315-321, (1998)
  • [17] Noren G.N., Edwards I.R., A statistical methodology for drug-drug interaction surveillance, Stat Med, 27, 16, pp. 3057-3070, (2008)
  • [18] Evans S.J.W., Davis S., Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports, Pharmacoepidemiol Drug Safety, 10, 6, pp. 483-486, (2001)
  • [19] Evans S.J.W., Stephen's Detection of New Adverse Drug Reactions, Statistics: analysis and presentation of safety data, pp. 301-328, (2004)
  • [20] Genkin A., Madigan D., Large-scale Bayesian logistic regression for text categorization, Technometrics, 49, 3, pp. 291-304, (2007)