Categorization and Analysis of Text in Computer Mediated Communication Archives using Visualization

被引:14
作者
Abbasi, Ahmed [1 ]
Chen, Hsinchun [1 ]
机构
[1] Univ Arizona, Dept Management Informat Syst, Artificial Intelligence Lab, Tucson, AZ 85721 USA
来源
PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT | 2007年
关键词
Visualization; Text Mining; Computer Mediated Communication;
D O I
10.1145/1255175.1255178
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.
引用
收藏
页码:11 / 18
页数:8
相关论文
共 39 条
[1]   Applying authorship analysis to extremist-group web forum messages [J].
Abbasi, A ;
Chen, HC .
IEEE INTELLIGENT SYSTEMS, 2005, 20 (05) :67-75
[2]  
Abbasi A., 2006, 4 IEEE S INT SEC INF
[3]  
[Anonymous], 2000, HARVARD BUSINESS REV
[4]  
[Anonymous], J COMPUTER MEDIATED
[5]   HelpfulMed: Intelligent searching for medical information over the Internet [J].
Chen, HC ;
Lally, AM ;
Zhu, B ;
Chau, M .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (07) :683-694
[6]  
Dash M., 1997, Intelligent Data Analysis, V1
[7]   A semantic approach to visualizing online conversations [J].
Donath, J .
COMMUNICATIONS OF THE ACM, 2002, 45 (04) :45-49
[8]  
Donath J., 1999, COMMUNITIES CYBERSPA
[9]  
DUCH W, 2004, NEURAL NETWORKS, P15
[10]  
Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651