Learning from imbalanced data in surveillance of nosocomial infection

被引:155
作者
Cohen, Gilles [1 ]
Hilario, Melanie
Sax, Hugo
Hugonnet, Stephane
Geissbuhler, Antoine
机构
[1] Univ Hosp Geneva, Med Informat Serv, Geneva, Switzerland
[2] Univ Geneva, Artificial Intelligence Lab, Geneva, Switzerland
[3] Univ Hosp Geneva, Dept Internal Med, Geneva, Switzerland
关键词
nosocomial infection; machine learning; support vector machines; data imbalance;
D O I
10.1016/j.artmed.2005.03.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: An important problem that arises in hospitals is the monitoring and detection of nosocomial or hospital acquired infections (Nis). This paper describes a retrospective analysis of a prevalence survey of Nis done in the Geneva University Hospital. Our goat is to identify patients with one or more Nis on the basis of clinical and other data collected during the survey. Methods and material: Standard surveillance strategies are time-consuming and cannot be applied hospital-wide; alternative methods are required. In NI detection viewed as a classification task, the main difficulty resides in the significant imbalance between positive or infected (11%) and negative (89%) cases. To remedy class imbalance, we explore two distinct avenues: (1) a new resampling approach in which both oversampling of rare positives and undersampling of the noninfected majority rely on synthetic cases (prototypes) generated via class-specific subclustering, and (2) a support vector algorithm in which asymmetrical margins are tuned to improve recognition of rare positive cases. Results and conclusion: Experiments have shown both approaches to be effective for the NI detection problem. Our novel resampling strategies perform remarkably better than classical random resampling. However, they are outperformed by asymmetrical soft margin support vector machines which attained a sensitivity rate of 92%, significantly better than the highest sensitivity (87%) obtained via prototype-based resampling. (C) 2005 Published by Elsevier B.V.
引用
收藏
页码:7 / 18
页数:12
相关论文
共 31 条
[1]  
Ali K., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P115
[2]   Improving support vector machine classifiers by modifying kernel functions [J].
Amari, S ;
Wu, S .
NEURAL NETWORKS, 1999, 12 (06) :783-789
[3]  
[Anonymous], DATA MINING MULTIMED
[4]   Association rules and data mining in hospital infection control and public health surveillance [J].
Brossette, SE ;
Sprague, AP ;
Hardin, JM ;
Waites, KB ;
Jones, WT ;
Moser, SA .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1998, 5 (04) :373-381
[5]  
Brossette SE, 2000, METHOD INFORM MED, V39, P303
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]   SIGNAL DETECTABILITY - THE USE OF ROC CURVES AND THEIR ANALYSES [J].
CENTOR, RM .
MEDICAL DECISION MAKING, 1991, 11 (02) :102-106
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[10]  
Cristianini N., 2000, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods