Combined use of association rules mining, and clustering methods to find relevant links between binary rare attributes in a large data set

被引:56
作者
Plasse, Marie
Niang, Ndeye
Saporta, Gilbert
Villerninot, Alexandre
Leblond, Laurent
机构
[1] Conservatoire Natl Arts & Metiers, Lab CEDRIC, F-75141 Paris, France
[2] PSA Peugeot Citroen, Zone Aeronaut Louis Breguet, F-78943 Velizy Villacoublay, France
关键词
association rules mining; variable clustering; large sparse matrix; binary attributes; rule relevancy index;
D O I
10.1016/j.csda.2007.02.020
中图分类号
TP39 [计算机的应用];
学科分类号
081203 [计算机应用技术]; 0835 [软件工程];
摘要
A method to analyse links between binary attributes in a large sparse data set is proposed. Initially the variables are clustered to obtain homogeneous clusters of attributes. Association rules are then mined in each cluster. A graphical comparison of some rule relevancy indexes is presented. It is used to extract best rules depending on the application concerned. The proposed methodology is illustrated by an industrial application from the automotive industry with more than 80000 vehicles each described by more than 3000 rare attributes. (c) 2007 Elsevier B. V. All rights reserved.
引用
收藏
页码:596 / 613
页数:18
相关论文
共 23 条
[1]
Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
[2]
[Anonymous], P ACM SIGMOD C MAN D
[3]
DENZA AI, 2005, 11 INT S APPL STOCH
[4]
Fichet B., 1984, STAT ANAL DONNEES, V9, P11
[5]
FORGY EW, 1965, BIOMETRICS, V21, P768
[6]
METRIC AND EUCLIDEAN PROPERTIES OF DISSIMILARITY COEFFICIENTS [J].
GOWER, JC ;
LEGENDRE, P .
JOURNAL OF CLASSIFICATION, 1986, 3 (01) :5-48
[7]
HAN J, 2000, P ACM SIGMOD C MAN D
[8]
Kulczynski S., 1927, Bull. Int. Acad. Pol. Sci. Lett. C1. Sci., P57
[9]
LENCA P, 2004, REV NOUVELLES TECHNO
[10]
Nakache J. P., 2005, APPROCHE PRAGMATIQUE