Fuzzy clustering of 627 alcohols, guided by a strategy for cluster analysis of chemical compounds for combinatorial chemistry

被引:28
作者
Linusson, A [1 ]
Wold, S
Nordén, B
机构
[1] Umea Univ, Dept Organ Chem, Chemometr Res Grp, S-90187 Umea, Sweden
[2] AB Hassle, S-43183 Molndal, Sweden
关键词
cluster analysis; 627; alcohols; combinatorial chemistry; fuzzy clustering; PLS;
D O I
10.1016/S0169-7439(98)00120-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A strategy for cluster analysis of chemical compounds for combinatorial chemistry is presented in this paper and applied to a set of 627 alcohols. The alcohols are characterised by 50 semi-empirical descriptors and the resulting 627 x 50 table was compressed by PCA. The method used to investigate the groupings was fuzzy clustering, using the fuzzy c-means algorithm. This technique allows a compound to belong to more than one group. Different values of the fuzziness coefficients were used and two different distances were incorporated in the algorithm, the traditional Euclidean distance and the Mahalanobis distance. The latter takes correlations within a group into account and can hence deal with elongated clusters. The resulted membership matrices were validated by PLS regression. The models created were used to verify statistical and chemical relevance of the formed clusters. The results showed that the Mahlanobis distance and a fuzziness coefficient of 1.2 should be used for an optimal clustering. The coefficients from the PLS models were further used for chemical interpretation of the groups. The four groups were chemically interpretable and consistent. The first group contained large flexible molecules, the second contained more polar compounds, the third contained molecules with two or more aromatic rings fused together, and the forth contained small and relatively reactive molecules. Molecules that did not fit into any of the groups, i.e., singletons, were flagged as outliers in the PLS models. (C) 1998 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:213 / 227
页数:15
相关论文
共 32 条
[1]   COMPARISON OF HIERARCHICAL CLUSTER-ANALYSIS TECHNIQUES FOR AUTOMATIC CLASSIFICATION OF CHEMICAL STRUCTURES [J].
ADAMSON, GW ;
BAWDEN, D .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1981, 21 (04) :204-209
[2]   COMPARISON OF PERFORMANCE OF SOME SIMILARITY AND DISSIMILARITY MEASURES IN AUTOMATIC CLASSIFICATION OF CHEMICAL STRUCTURES [J].
ADAMSON, GW ;
BUSH, JA .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1975, 15 (01) :55-58
[3]  
[Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
[4]   CLUSTERING OF CHEMICAL STRUCTURES ON THE BASIS OF 2-DIMENSIONAL SIMILARITY MEASURES [J].
BARNARD, JM ;
DOWNS, GM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (06) :644-649
[5]   CLUSTER-ANALYSIS [J].
BRATCHELL, N .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1989, 6 (02) :105-125
[6]   Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03) :572-584
[7]   GENERALIZED FUZZY C-SHELLS CLUSTERING AND DETECTION OF CIRCULAR AND ELLIPTIC BOUNDARIES [J].
DAVE, RN .
PATTERN RECOGNITION, 1992, 25 (07) :713-721
[8]   Algorithm5: A technique for fuzzy similarity clustering of chemical inventories [J].
Doman, TN ;
Cibulskis, JM ;
Cibulskis, MJ ;
McCray, PD ;
Spangler, DP .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (06) :1195-1204
[9]   SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA [J].
DOWNS, GM ;
WILLETT, P ;
FISANICK, W .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (05) :1094-1102
[10]   VALIDITY STUDIES IN CLUSTERING METHODOLOGIES [J].
DUBES, R ;
JAIN, AK .
PATTERN RECOGNITION, 1979, 11 (04) :235-254