Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures

被引:48
作者
Raymond, JW
Blankley, CJ
Willett, P
机构
[1] Pfizer Global Res & Dev, Ann Arbor Labs, Ann Arbor, MI 48105 USA
[2] Univ Sheffield, Krebs Inst Biomol Res, Sheffield S10 2TN, S Yorkshire, England
[3] Univ Sheffield, Dept Informat Studies, Sheffield S10 2TN, S Yorkshire, England
关键词
bit-string; chemical graph; chemical series; clustering method; fingerprint; maximum common subgraph; molecular similarity; similarity;
D O I
10.1016/S1093-3263(02)00188-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin-Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures. (C) 2002 Elsevier Science Inc. All rights reserved.
引用
收藏
页码:421 / 433
页数:13
相关论文
共 31 条
[1]   METHOD FOR AUTOMATIC CLASSIFICATION OF CHEMICAL STRUCTURES [J].
ADAMSON, GW ;
BUSH, JA .
INFORMATION STORAGE AND RETRIEVAL, 1973, 9 (10) :561-568
[2]   COMPARISON OF HIERARCHICAL CLUSTER-ANALYSIS TECHNIQUES FOR AUTOMATIC CLASSIFICATION OF CHEMICAL STRUCTURES [J].
ADAMSON, GW ;
BAWDEN, D .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1981, 21 (04) :204-209
[3]   Clustering gene expression patterns [J].
Ben-Dor, A ;
Shamir, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :281-297
[4]   The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :1-9
[5]   Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03) :572-584
[6]  
Carpaneto G., 1988, Annals of Operations Research, V13, P193
[7]  
Diestel R., 2000, GRAPH THEORY
[8]   SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA [J].
DOWNS, GM ;
WILLETT, P ;
FISANICK, W .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (05) :1094-1102
[9]  
Everitt B., 1993, CLUSTER ANAL
[10]  
GLOVER F, 2001, THEORY APPL EVOLUTIO