Text analysis and knowledge mining system

被引:96
作者
Nasukawa, T [1 ]
Nagano, T [1 ]
机构
[1] IBM Corp, Div Res, Tokyo Res Lab, Kanagawa, Japan
关键词
Data mining - Database systems - Information retrieval systems - Personal computers - Statistical methods - Text processing;
D O I
10.1147/sj.404.0967
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large text databases potentially contain a great wealth of knowledge. However, text represents factual information (and information about the author's communicative intentions) in a complex, rich, and opaque manner. Consequently, unlike numerical and fixed field data, it cannot be analyzed by standard statistical data mining methods. Relying on human analysis results in either huge workloads or the analysis of only a tiny fraction of the database. We are working on text mining technology to extract knowledge from very large amounts of textual data. Unlike information retrieval technology that allows a user to select documents that meet the user's requirements and interests, or document clustering technology that organizes documents, we focus on finding valuable patterns and rules in text that indicate trends and significant features about specific topics. By applying our prototype system named TAKMI (Text Analysis and Knowledge Mining) to textual databases in PC help centers, we can automatically detect product failures; determine issues that have led to rapid increases in the number of calls and their underlying reasons; and analyze help center productivity and changes in customers' behavior involving a particular product, without reading any of the text. We have verified that our framework is also effective for other data such as patent documents.
引用
收藏
页码:967 / 984
页数:18
相关论文
共 18 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Cohen W. W., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P169
[3]  
Feldman R., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P16
[4]  
Feldman R., 1995, Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), P112
[5]  
Hahn U., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P175
[6]   Predicting the semantic orientation of adjectives [J].
Hatzivassiloglou, V ;
McKeown, KR .
35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, :174-181
[7]  
Hearst MA, 1999, Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, P3
[8]  
Lent B., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P227
[9]  
MARUYAMA H, 1995, FORMAL APPROACH JAPA
[10]  
MATSUZAWA H, 2000, P 4 PAC AS INT C KNO, P233