IT Ticket Classification: The Simpler, the Better

被引:23
作者
Revina, Aleksandra [1 ,2 ]
Buza, Krisztian [3 ]
Meister, Vera G. [2 ]
机构
[1] Tech Univ Berlin, Fac Econ & Management, Chair Informat & Commun Management, D-10623 Berlin, Germany
[2] Brandenburg Univ Appl Sci, Fac Econ, D-14770 Brandenburg, Germany
[3] Eotvos Lorand Univ, Fac Informat, H-1117 Budapest, Hungary
关键词
Linguistics; Complexity theory; Feature extraction; Task analysis; Training; Prediction algorithms; Decision trees; IT tickets; linguistics; machine learning; text classification; TF-IDF; process complexity; SERVICE MANAGEMENT; TEXT CLASSIFICATION; DECISION-MAKING; SOFTWARE; REGRESSION; FEATURES;
D O I
10.1109/ACCESS.2020.3032840
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Recently, automatic classification of IT tickets has gained notable attention due to the increasing complexity of IT services deployed in enterprises. There are multiple discussions and no general opinion in the research and practitioners; community on the design of IT ticket classification tasks, specifically the choice of ticket text representation techniques and classification algorithms. Our study aims to investigate the core design elements of a typical IT ticket text classification pipeline. In particular, we compare the performance of TF-IDF and linguistic features-based text representations designed for ticket complexity prediction. We apply various classifiers, including kNN, its enhanced versions, decision trees, naIve Bayes, logistic regression, support vector machines, as well as semi-supervised techniques to predict the ticket class label of low, medium, or high complexity. Finally, we discuss the evaluation results and their practical implications. As our study shows, linguistic representation not only proves to be highly explainable but also demonstrates a substantial prediction quality increase over TF-IDF. Furthermore, our experiments evidence the importance of feature selection. We indicate that even simple algorithms can deliver high-quality prediction when using appropriate linguistic features.
引用
收藏
页码:193380 / 193395
页数:16
相关论文
共 98 条
[1]
Automatic problem extraction and analysis from unstructured text in IT tickets [J].
Agarwal, S. ;
Aggarwal, V. ;
Akula, A. R. ;
Dasgupta, G. B. ;
Sridhara, G. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2017, 61 (01) :41-52
[2]
Agarwal S., P 18 ACM SIGKDD INT, V2012, P1393
[3]
Ahsan S. N., 2009, Proceedings of the 33rd Annual IEEE Software Engineering Workshop SEW-33 2009, P79, DOI 10.1109/SEW.2009.15
[4]
Ahsan SN, 2010, P 2010 ACM IEEE INT, P1
[5]
[Anonymous], 2014, INT C MACHINE LEARNI
[6]
[Anonymous], 2018, Qualitative Researching
[7]
[Anonymous], 1995, COMPUTER AIDED QUALI
[8]
[Anonymous], 2017, ARXIV171108609
[9]
[Anonymous], 1997, Machine Learning
[10]
An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139