SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA

被引:121
作者
DOWNS, GM
WILLETT, P
FISANICK, W
机构
[1] UNIV SHEFFIELD,DEPT INFORMAT STUDIES,SHEFFIELD S10 2TN,S YORKSHIRE,ENGLAND
[2] CHEM ABSTRACTS SERV INC,COLUMBUS,OH 43210
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1994年 / 34卷 / 05期
关键词
D O I
10.1021/ci00021a011
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Previous work on the clustering of chemical-structure databases has focused on the use of intermolecular similarity measures that are based on structural features of various kinds. In this paper, we report nearest-neighbor searching and clustering experiments with a set of 5982 molecules, each of which is characterized by 13 calculated global molecular properties. The nearest-neighbor algorithm is an upperbound procedure that uses the triangle inequality to minimize the number of distance calculations that need to be carried out when searching for nearest neighbors in metric spaces. Our experiments suggest that it performs well when small numbers of nearest neighbors are required, but that the basic ''brute-force'' procedure is best when large numbers are needed, such as when clustering is to be carried out. The clustering methods tested are the Ward and group-average hierarchic agglomerative methods, the minimum-diameter polythetic hierarchic divisive method, and the Jarvis-Patrick nearest-neighbor method. Our experiments suggest that the first three methods, which gave similar results, are the best methods for clustering molecules characterized by property data. The Jarvis-Patrick method, which has been extensively used for clustering molecules characterized by structural fragments, was not as effective as these other methods.
引用
收藏
页码:1094 / 1102
页数:9
相关论文
共 34 条
  • [1] CLUSTERING OF CHEMICAL STRUCTURES ON THE BASIS OF 2-DIMENSIONAL SIMILARITY MEASURES
    BARNARD, JM
    DOWNS, GM
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (06): : 644 - 649
  • [2] EFFECT OF STANDARDIZATION ON FRAGMENT-BASED MEASURES OF STRUCTURAL SIMILARITY
    BATH, PA
    MORRIS, CA
    WILLETT, P
    [J]. JOURNAL OF CHEMOMETRICS, 1993, 7 (06) : 543 - 550
  • [3] BURKHARD WA, 1973, COMMUN ACM, V16, P230, DOI 10.1145/362003.362025
  • [4] PHARMACOPHORIC PATTERN-MATCHING IN PILES OF 3-DIMENSIONAL CHEMICAL STRUCTURES - COMPARISON OF CONFORMATIONAL-SEARCHING ALGORITHMS FOR FLEXIBLE SEARCHING
    CLARK, DE
    JONES, G
    WILLETT, P
    KENNY, PW
    GLEN, RC
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (01): : 197 - 206
  • [5] PHARMACOPHORIC PATTERN-MATCHING IN FILES OF 3-DIMENSIONAL CHEMICAL STRUCTURES - USE OF BOUNDED DISTANCE MATRICES FOR THE REPRESENTATION AND SEARCHING OF CONFORMATIONALLY FLEXIBLE MOLECULES
    CLARK, DE
    WILLETT, P
    KENNY, PW
    [J]. JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 1992, 10 (04) : 194 - 204
  • [6] DOWNS GM, IN PRESS CHEMOMETRIC
  • [7] Everitt B.S., 1993, CLUSTER ANAL, Vthird
  • [8] EXPERIMENTAL SYSTEM FOR SIMILARITY AND 3D SEARCHING OF CAS REGISTRY SUBSTANCES .1. 3D SUBSTRUCTURE SEARCHING
    FISANICK, W
    CROSS, KP
    FORMAN, JC
    RUSINKO, A
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1993, 33 (04): : 548 - 559
  • [9] Fisanick W., 1990, TETRAHEDRON COMPUT M, V3, P635
  • [10] Gordon A. D., 1981, CLASSIFICATION