Towards a text mining methodology using association rule extraction

被引:11
作者
Cherfi, H [1 ]
Napoli, A [1 ]
Toussaint, Y [1 ]
机构
[1] Nancy Univ, INRIA, CNRS, LORIA, F-54506 Vandoeuvre Les Nancy, France
关键词
Molecular Biology; Mathematical Logic; Quality Measure; Association Rule; Knowledge Discovery;
D O I
10.1007/s00500-005-0504-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a methodology for text mining relying on the classical knowledge discovery loop, with a number of adaptations. First, texts are indexed and prepared to be processed by frequent itemset levelwise search. Association rules are then extracted and interpreted, with respect to a set of quality measures and domain knowledge, under the control of an analyst. The article includes an experimentation on a real-world text corpus holding on molecular biology.
引用
收藏
页码:431 / 441
页数:11
相关论文
共 38 条
[1]  
ANICK P, 1990, P 30 INT C COMP LING, V3, P7
[2]  
[Anonymous], 1991, MATHEMATIQUES INFORM
[3]  
AZE J, 2003, RNTI2
[4]  
AZE J, 2003, RSTI RIA ECA, V17, P283
[5]  
Bayardo R.J., 1999, P 5 ACM SIGKDD INT C, P145, DOI [10.1145/312129.312219, DOI 10.1145/312129.312219]
[6]  
Brill E., 1999, TEXT SPEECH LANG TEC, P27
[7]  
BRIN S, 1997, P ACM SIGMOD 97 C MA, V36, P255
[8]   Maintenance of discovered association rules in large databases: Art incremental updating technique [J].
Cheung, DW ;
Han, JW ;
Ng, VT ;
Wong, CY .
PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, :106-114
[9]  
COURTINE M, 2001, P WORKSH CONC LATT B, P65
[10]  
DELGADO M, 2002, LECT NOTES ARTIF INT, V2447, P140