A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs

被引:128
作者
De Weerdt, Jochen [1 ]
De Backer, Manu [1 ,2 ]
Vanthienen, Jan [1 ]
Baesens, Bart [1 ,3 ]
机构
[1] Katholieke Univ Leuven, Dept Decis Sci & Informat Management, B-3000 Louvain, Belgium
[2] Univ Ghent, Hogesch Gent, Dept Business Adm & Publ Management, B-9000 Ghent, Belgium
[3] Univ Southampton, Sch Management, Southampton SO17 1BJ, Hants, England
关键词
Process mining; Benchmarking; Real-life event logs; Accuracy; Comprehensibility; MINING PROCESS MODELS;
D O I
10.1016/j.is.2012.02.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Process mining is the research domain that is dedicated to the a posteriori analysis of business process executions. The techniques developed within this research area are specifically designed to provide profound insight by exploiting the untapped reservoir of knowledge that resides within event logs of information systems. Process discovery is one specific subdomain of process mining that entails the discovery of control-flow models from such event logs. Assessing the quality of discovered process models is an essential element, both for conducting process mining research as well as for the use of process mining in practice. In this paper, a multi-dimensional quality assessment is presented in order to comprehensively evaluate process discovery techniques. In contrast to previous studies, the major contribution of this paper is the use of eight real-life event logs. For instance, we show that evaluation based on real-life event logs significantly differs from the traditional approach to assess process discovery techniques using artificial event logs. In addition, we provide an extensive overview of available process discovery techniques and we describe how discovered process models can be assessed regarding both accuracy and comprehensibility. The results of our study indicate that the HeuristicsMiner algorithm is especially suited in a real-life setting. However, it is also shown that, particularly for highly complex event logs, knowledge discovery from such data sets can become a major problem for traditional process discovery techniques. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:654 / 676
页数:23
相关论文
共 67 条
[21]   MULTIPLE COMPARISONS AMONG MEANS [J].
DUNN, OJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1961, 56 (293) :52-&
[22]  
Ferreira D, 2007, LECT NOTES COMPUT SC, V4714, P360
[23]  
Ferreira DR, 2009, LECT NOTES COMPUT SC, V5701, P143, DOI 10.1007/978-3-642-03848-8_11
[24]   An integrated life cycle for workflow management based on learning and planning [J].
Ferreira, Hugo M. ;
Ferreira, Diogo R. .
INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2006, 15 (04) :485-505
[25]  
FOLINO F, 2009, P 2009 INT DAT ENG A, P162, DOI DOI 10.1145/1620432.1620449
[26]   A comparison of alternative tests of significance for the problem of m rankings [J].
Friedman, M .
ANNALS OF MATHEMATICAL STATISTICS, 1940, 11 :86-92
[27]  
Gaaloul W, 2005, LECT NOTES COMPUT SC, V3588, P24
[28]   Process discovery in event logs: An application in the telecom industry [J].
Goedertier, Stijn ;
De Weerdt, Jochen ;
Martens, David ;
Vanthienen, Jan ;
Baesens, Bart .
APPLIED SOFT COMPUTING, 2011, 11 (02) :1697-1710
[29]  
Goedertier S, 2009, J MACH LEARN RES, V10, P1305
[30]   Mining taxonomies of process models [J].
Greco, Gianluigi ;
Guzzo, Antonella ;
Pontieri, Luigi .
DATA & KNOWLEDGE ENGINEERING, 2008, 67 (01) :74-102