Automatic online news monitoring and classification for syndromic surveillance

被引:45
作者
Zhang, Yulei [1 ]
Dang, Yan [1 ]
Chen, Hsinchun [1 ]
Thurmond, Mark [2 ]
Larson, Cathy [1 ]
机构
[1] Univ Arizona, Eller Coll Management, Dept Management Informat Syst, Artificial Intelligence Lab, Tucson, AZ 85721 USA
[2] Univ Calif Davis, CADMS, FMD Lab, Davis, CA 95616 USA
基金
美国国家科学基金会;
关键词
News classification; News monitoring; Feature selection; Syndromic surveillance; SELECTION PROBLEM; WEB; MESSAGES; SYSTEMS; SEARCH;
D O I
10.1016/j.dss.2009.04.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Syndromic surveillance can play an important role in protecting the public's health against infectious diseases. Infectious disease outbreaks can have a devastating effect on society as well as the economy, and global awareness is therefore critical to protecting against major outbreaks. By monitoring online news sources and developing an accurate news classification system for syndromic surveillance, public health personnel can be apprised of outbreaks and potential outbreak situations. In this study, we have developed a framework for automatic online news monitoring and classification for syndromic surveillance. The framework is unique and none of the techniques adopted in this study have been previously used in the context of syndromic surveillance on infectious diseases. In recent classification experiments. we compared the performance of different feature subsets on different machine learning algorithms. The results showed that the combined feature subsets including Bag of Words, Noun Phrases. and Named Entities features outperformed the Bag of Words feature Subsets. Furthermore, feature selection improved the performance of feature subsets in online news classification. The highest classification performance was achieved when using SVM upon the selected combination feature subset. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:508 / 517
页数:10
相关论文
共 51 条
[1]   Applying authorship analysis to extremist-group web forum messages [J].
Abbasi, A ;
Chen, HC .
IEEE INTELLIGENT SYSTEMS, 2005, 20 (05) :67-75
[2]  
ABERDEEN J, 1996, P TIPSTER 24 MONTH W
[3]  
ABERDEEN J, 1995, P 6 MESS UND C MUC 6
[4]  
[Anonymous], Data Mining Practical Machine Learning Tools and Techniques with Java
[5]  
[Anonymous], 1998, CORRELATION BASED FE
[6]  
[Anonymous], 2000, CORRELATION BASED FE
[7]  
[Anonymous], ADV DIS SURVEILLANCE
[8]  
[Anonymous], EUROSURVEILLANCE
[9]  
BESSELL PR, 2006, T8 6 FUTURE RISKS FO
[10]   Systematic review: Surveillance systems for early detection of bioterrorism-related diseases [J].
Bravata, DM ;
McDonald, KM ;
Smith, WM ;
Rydzak, C ;
Szeto, H ;
Buckeridge, DL ;
Haberland, C ;
Owens, DK .
ANNALS OF INTERNAL MEDICINE, 2004, 140 (11) :910-922