Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems

被引:54
作者
Cui, Jian Feng [2 ]
Chae, Heung Seok [1 ]
机构
[1] Pusan Natl Univ, Dept Sci & Engn, Pusan 609735, South Korea
[2] Xiamen Univ Technol, Dept Comp Sci & Technol, Xiamen 361024, Peoples R China
关键词
Component identification; Agglomerative hierarchical clustering algorithm; Weighting scheme; Similarity measure; Legacy systems; Software reengineering;
D O I
10.1016/j.infsof.2011.01.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Context: Component identification, the process of evolving legacy system into finely organized component-based software systems, is a critical part of software reengineering. Currently, many component identification approaches have been developed based on agglomerative hierarchical clustering algorithms. However, there is a lack of thorough investigation on which algorithm is appropriate for component identification. Objective: This paper focuses on analyzing agglomerative hierarchical clustering algorithms in software reengineering, and then identifying their respective strengths and weaknesses in order to apply them effectively for future practical applications. Method: A series of experiments were conducted for 18 clustering strategies combined according to various similarity measures, weighting schemes and linkage methods. Eleven subject systems with different application domains and source code sizes were used in the experiments. The component identification results are evaluated by the proposed size, coupling and cohesion criteria. Results: The experimental results suggested that the employed similarity measures, weighting schemes and linkage methods can have various effects on component identification results with respect to the proposed size, coupling and cohesion criteria, so the hierarchical clustering algorithms produced quite different clustering results. Conclusions: According to the experimental results, it can be concluded that it is difficult to produce perfectly satisfactory results for a given clustering algorithm. Nevertheless, these algorithms demonstrated varied capabilities to identify components with respect to the proposed size, coupling and cohesion criteria. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:601 / 614
页数:14
相关论文
共 35 条
[1]
OOMeter: A software quality assurance tool [J].
Alghamdi, JS ;
Rufai, RA ;
Khan, SM .
Ninth European Conference on Software Maintenance and Reengineering, Proceedings, 2005, :190-191
[2]
Information-theoretic software clustering [J].
Andritsos, P ;
Tzerpos, V .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2005, 31 (02) :150-165
[3]
[Anonymous], P ERTS C JAN
[4]
[Anonymous], MEANING LEGACY SYSTE
[5]
[Anonymous], J INFORM PROCESSIN D
[6]
[Anonymous], P 2 WORKSH PROD PERF
[7]
[Anonymous], P 14 WORK C REV ENG
[8]
[Anonymous], P IEEE INT C E BUS E
[9]
[Anonymous], MATH CLASSIFICATION
[10]
[Anonymous], P FUT SOFTW ENG