Conversation Extraction in Dynamic Text Message Stream

被引:4
作者
Wang, Le [1 ]
Jia, Yan [2 ]
Chen, Yingwen [2 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Comp Applicat, Changsha, Hunan, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Changsha, Hunan, Peoples R China
关键词
text message; conversation extraction; content similarity; linguistic feature;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Text message stream which is produced by Instant Messager and Internet Relay Chat poses interesting and challenging problems for information technologies. It is beneficial to extract the conversations in this kind of chatting message stream for information management and knowledge finding. However, the data in text message stream are usually very short and incomplete, and it requires efficiency to monitor thousands of continuous chat sessions. Many existing text mining methods encounter challenges. This paper focuses on the conversation extraction in dynamic text message stream. We design the dynamic representation for messages to combine the text content information and linguistic feature in message stream. A memory structure of reversed maximal similar relationship is developed for renewable assignments when grouping messages into conversations. We finally propose a double time window algorithm based on above methods to extract conversations in dynamic text message stream. Experiments on a real dataset shows that our method outperforms two baseline methods introduced in a recent related paper about 47% and 15% in terms of F measure respectively.
引用
收藏
页码:86 / 93
页数:8
相关论文
共 14 条
[1]  
Allan J., 1998, DARPA BROADCAST NEWS, P194
[2]  
Bengel J, 2004, LECT NOTES COMPUT SC, V3073, P266
[3]  
Carenini Giuseppe, 2007, P 16 INT C WORLD WID, P91
[4]  
Cooper R., 2006, EXTRACTING DATA PERS
[5]  
Dou Shen, 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P35, DOI 10.1145/1148170.1148180
[6]  
Guan Y, 2002, 2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, P234, DOI 10.1109/ICMLC.2002.1176746
[7]  
Jong Wook Kim, 2005, P 14 INT C WORLD WID, P322
[8]  
Ma DF, 2007, LECT NOTES COMPUT SC, V4558, P582
[9]  
MEADOW CT, 2000, TEXT INFORM RETRIEVA
[10]  
Steinbach K. G., 2000, 00034 U MINN DEP COM