An interior point algorithm for minimum sum-of-squares clustering

被引:66
作者
Du Merle, O [1 ]
Hansen, P
Jaumard, B
Mladenovic, N
机构
[1] McGill Univ, Fac Management, GERAD, Montreal, PQ, Canada
[2] Ecole HEC, GERAD, Dept Methodes Quantitat Gest, Montreal, PQ, Canada
[3] Ecole Polytech, GERAD, Montreal, PQ H3C 3A7, Canada
关键词
classification and discrimination; cluster analysis; interior-point methods; combinatorial optimization;
D O I
10.1137/S1064827597328327
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
An exact algorithm is proposed for minimum sum-of-squares nonhierarchical clustering, i.e., for partitioning a given set of points from a Euclidean m-space into a given number of clusters in order to minimize the sum of squared distances from all points to the centroid of the cluster to which they belong. This problem is expressed as a constrained hyperbolic program in 0-1 variables. The resolution method combines an interior point algorithm, i.e., a weighted analytic center column generation method, with branch-and-bound. The auxiliary problem of determining the entering column (i.e., the oracle) is an unconstrained hyperbolic program in 0-1 variables with a quadratic numerator and linear denominator. It is solved through a sequence of unconstrained quadratic programs in 0-1 variables. To accelerate resolution, variable neighborhood search heuristics are used both to get a good initial solution and to solve quickly the auxiliary problem as long as global optimality is not reached. Estimated bounds for the dual variables are deduced from the heuristic solution and used in the resolution process as a trust region. Proved minimum sum-of-squares partitions are determined for the rst time for several fairly large data sets from the literature, including Fisher's 150 iris.
引用
收藏
页码:1485 / 1505
页数:21
相关论文
共 44 条
[1]  
Benzecri J-P., 1982, Cahiers de l'analyse des donnees, V7, P209
[2]   BICRITERION CLUSTER-ANALYSIS [J].
DELATTRE, M ;
HANSEN, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1980, 2 (04) :277-291
[3]   EVALUATION OF A BRANCH AND BOUND ALGORITHM FOR CLUSTERING [J].
DIEHR, G .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1985, 6 (02) :268-284
[4]  
Dinkelbach Werner., 1967, Manage. Sci., V13, P492, DOI [DOI 10.1287/MNSC.13.7.492, 10.1287/mnsc.13.7.492]
[5]   On improvements to the analytic center cutting plane method [J].
Du Merle, O ;
Goffin, JL ;
Vial, JP .
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 1998, 11 (01) :37-52
[6]   Stabilized column generation [J].
du Merle, O ;
Villeneuve, D ;
Desrosiers, J ;
Hansen, P .
DISCRETE MATHEMATICS, 1999, 194 (1-3) :229-237
[7]  
DUMERLE O, 1995, THESIS U GENEVA SWIT
[8]   A METHOD FOR CLUSTER ANALYSIS [J].
EDWARDS, AWF ;
CAVALLIS.LL .
BIOMETRICS, 1965, 21 (02) :362-&
[9]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[10]   A LINEAR-PROGRAMMING APPROACH TO THE CUTTING-STOCK PROBLEM [J].
GILMORE, PC ;
GOMORY, RE .
OPERATIONS RESEARCH, 1961, 9 (06) :849-859