A genetic algorithm approach to cluster analysis

被引:102
作者
Cowgill, MC [1 ]
Harvey, RJ
Watson, LT
机构
[1] Virginia Polytech Inst & State Univ, Dept Psychol, Blacksburg, VA 24061 USA
[2] Virginia Polytech Inst & State Univ, Dept Comp Sci, Blacksburg, VA 24061 USA
[3] Virginia Polytech Inst & State Univ, Dept Math, Blacksburg, VA 24061 USA
关键词
cluster analysis; data clustering; genetic algorithm; global optimization; variance-ratio maximization;
D O I
10.1016/S0898-1221(99)00090-5
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
A common problem in the social and agricultural sciences is to find clusters in experimental data; the standard attack is a deterministic search terminating in a locally optimal clustering. We propose here a genetic algorithm (GA) for performing cluster analysis. GAs have been used profitably in a variety of contexts in which it is either impractical or impossible to directly solve for a globally optimal solution to complex numerical problems. In the present case, our GA clustering technique attempted to maximize a variance-ratio (VR) based goodness-of-fit criterion defined in terms of external cluster isolation and internal cluster homogeneity. Although our GA-based clustering algorithm cannot guarantee to recover the cluster solution that exhibits the global maximum of this fitness function; it does explicitly work toward this goal (in marked contrast to existing clustering algorithms, especially hierarchical agglomerative ones such as Ward's method). Using both constrained and unconstrained simulated datasets, Monte Carlo results showed that in some conditions the genetic clustering algorithm did indeed surpass the performance of conventional clustering techniques (Ward's and K-means) in terms of an internal (VR) criterion. Suggestions for future refinement and study are offered. (C) 1999 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:99 / 108
页数:10
相关论文
共 50 条
  • [1] Anderberg M.R., 1973, Probability and Mathematical Statistics
  • [2] [Anonymous], 1991, Handbook of genetic algorithms
  • [3] BALAKRISHNAN PV, 1994, PSYCHOMETRIKA, V59, P509
  • [4] A COMPARISON OF 2 APPROACHES TO BETA-FLEXIBLE CLUSTERING
    BELBIN, L
    FAITH, DP
    MILLIGAN, GW
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 1992, 27 (03) : 417 - 433
  • [5] Belew R., 1991, P 4 INT C GEN ALG SA
  • [6] MIXTURE MODEL TESTS OF CLUSTER-ANALYSIS - ACCURACY OF 4 AGGLOMERATIVE HIERARCHICAL METHODS
    BLASHFIELD, RK
    [J]. PSYCHOLOGICAL BULLETIN, 1976, 83 (03) : 377 - 388
  • [7] REPLICATING CLUSTER-ANALYSIS - METHOD, CONSISTENCY, AND VALIDITY
    BRECKENRIDGE, JN
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 1989, 24 (02) : 147 - 161
  • [8] BUCKLEY BP, 1996, GENETIC ALGORITHMS
  • [9] A coarse-grained parallel variable-complexity multidisciplinary optimization paradigm
    Burgee, S
    Giunta, AA
    Balabanov, V
    Grossman, B
    Mason, WH
    Narducci, R
    Haftka, RT
    Watson, LT
    [J]. INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1996, 10 (04): : 269 - 299
  • [10] Calinski T, 1974, Communications in Statisticstheory and Methods, V3, P1, DOI [DOI 10.1080/03610927408827101, 10.1080/03610917408548446]