Large-scale regression-based pattern discovery: The example of screening the WHO global drug safety database

被引:50
作者
Caster O. [1 ,2 ]
Norén G.N. [1 ,3 ]
Madigan D. [4 ]
Bate A. [1 ,5 ]
机构
[1] WHO Collaborating Centre for International Drug Monitoring, Uppsala
[2] Department of Computer and Systems Sciences, Stockholm University, Stockholm
[3] Department of Mathematics, Stockholm University, Stockholm
[4] Department of Statistics, Columbia University, New York, NY
[5] Department of Mathematics and Computing, Brunel University, London
来源
Statistical Analysis and Data Mining | 2010年 / 3卷 / 04期
关键词
Adverse drug reaction surveillance; Confounding; Direct and indirect associations; Drug safety; Lasso; Masking; Pharmacovigilance; Shrinkage regression;
D O I
10.1002/sam.10078
中图分类号
学科分类号
摘要
Most measures of interestingness for patterns of co-occurring events are based on data projections onto contingency tables for the events of primary interest. As an alternative, this article presents the first implementation of shrinkage logistic regression for large-scale pattern discovery, with an evaluation of its usefulness in real-world binary transaction data. Regression accounts for the impact of other covariates that may confound or otherwise distort associations. The application considered is international adverse drug reaction (ADR) surveillance, in which large collections of reports on suspected ADRs are screened for interesting reporting patterns worthy of clinical follow-up. Our results show that regression-based pattern discovery does offer practical advantages. Specifically it can eliminate false positives and false negatives due to other covariates. Furthermore, it identifies some established drug safety issues earlier than a measure based on contingency tables. While regression offers clear conceptual advantages, our results suggest that methods based on contingency tables will continue to play a key role in ADR surveillance, for two reasons: the failure of regression to identify some established drug safety concerns as early as the currently used measures, and the relative lack of transparency of the procedure to estimate the regression coefficients. This suggests shrinkage regression should be used in parallel to existing measures of interestingness in ADR surveillance and other large-scale pattern discovery applications. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 197-208, 2010.
引用
收藏
页码:197 / 208
页数:11
相关论文
共 38 条
  • [1] Hand D.J., Bolton R.J., Pattern discovery and detection: A unified statistical methodology, J Appl Stat, 31, 8, pp. 889-924, (2004)
  • [2] Agrawal R., Verkamo A.I., Advances in knowledge discovery and data mining, Fast discovery of association rules, pp. 307-328, (1996)
  • [3] Tan P., Srivastava J., KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Selecting the right interestingness measure for association patterns, pp. 32-41, (2002)
  • [4] Das G., Ronkainen P., KDD '98: Proceedings of the fourth ACM SIGKDD international conference on Knowledge discovery and data mining, Similarity of attributes by external probes, pp. 23-29, (1998)
  • [5] Zaiane O.R., Han J., Discovering web access patterns and trends by applying OLAP and data mining technology on web logs, In ADL '98: Proceedings of the IEEE International Forum on Research and Technology Advances in Digital Libraries, pp. 19-29, (1998)
  • [6] Melamed I.D., Automatic construction of clean broad-coverage translation lexicons, In AMTA '96: Proceedings of the 1996 Conference of the Association for Machine Translation in the Americas, (1996)
  • [7] Bate A., Edwards I.R., The application of knowledge discovery in databases to post-marketing drug safety: example of the WHO database, Fund Clin Pharmacol, 22, 2, pp. 127-140, (2008)
  • [8] Purcell P., Statistical techniques for signal generation-The Australian experience, Drug Safety, 25, 6, pp. 415-421, (2002)
  • [9] Omiecinski E.R., Alternative interest measures for mining associations in databases, IEEE Trans Knowl Data Eng, 15, 1, pp. 57-69, (2003)
  • [10] Simpson E.H., The interpretation of interaction in contingency tables, J R Stat Soc Ser., B, 13, 2, pp. 238-241, (1951)