Database diversity assessment: New ideas, concepts, and tools

被引:46
作者
Nilakantan, R
Bauman, N
Haraki, KS
机构
[1] Wyeth-Ayerst Research, Pearl River, NY 10965, North Middletown Road
[2] Tomkins Cove, NY 10986
关键词
similarity; comparison; database; ring; ring-cluster; combinatorial;
D O I
10.1023/A:1007937308615
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present some new ideas for characterizing and comparing large chemical databases. The comparison of the contents of large databases is not trivial since it implies pairwise comparison of hundreds of thousands of compounds. We have developed methods for categorizing compounds into groups or series based on their ring-system content, using precalculated structure-based hashcodes. Two large databases can then be compared by simply comparing their hashcode tables. Furthermore, the number of distinct ring-system combinations can be used as an indicator of database diversity. We also present an independent technique for diversity assessment called the 'saturation diversity' approach. This method is based on picking as many mutually dissimilar compounds as possible from a database or a subset thereof. We show that both methods yield similar results. Since the two methods measure very different properties, this probably says more about the properties of the databases studied than about the methods.
引用
收藏
页码:447 / 452
页数:6
相关论文
共 18 条
  • [1] CAMBRIDGE CRYSTALLOGRAPHIC DATA CENTER - COMPUTER-BASED SEARCH, RETRIEVAL, ANALYSIS AND DISPLAY OF INFORMATION
    ALLEN, FH
    BELLARD, S
    BRICE, MD
    CARTWRIGHT, BA
    DOUBLEDAY, A
    HIGGS, H
    HUMMELINK, T
    HUMMELINKPETERS, BG
    KENNARD, O
    MOTHERWELL, WDS
    RODGERS, JR
    WATSON, DG
    [J]. ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE, 1979, 35 (OCT): : 2331 - 2339
  • [2] SUBSTRUCTURE SEARCHING METHODS - OLD AND NEW
    BARNARD, JM
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1993, 33 (04): : 532 - 538
  • [3] The properties of known drugs .1. Molecular frameworks
    Bemis, GW
    Murcko, MA
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) : 2887 - 2893
  • [4] CHARACTERIZING THE GEOMETRIC DIVERSITY OF FUNCTIONAL-GROUPS IN CHEMICAL DATABASES
    BOYD, SM
    BEVERLEY, M
    NORSKOV, L
    HUBBARD, RE
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1995, 9 (05) : 417 - 424
  • [5] ATOM PAIRS AS MOLECULAR-FEATURES IN STRUCTURE ACTIVITY STUDIES - DEFINITION AND APPLICATIONS
    CARHART, RE
    SMITH, DH
    VENKATARAGHAVAN, R
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1985, 25 (02): : 64 - 73
  • [6] Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds
    Cummins, DJ
    Andrews, CW
    Bentley, JA
    Cory, M
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (04): : 750 - 763
  • [7] Downs G. M., 1996, REV COMP CH, V7, P1
  • [8] DURRETT R, 1991, PROBABILITY THEORY E, P45
  • [9] A fast algorithm for selecting sets of dissimilar molecules from large chemical databases
    Holliday, JD
    Ranade, SS
    Willett, P
    [J]. QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1995, 14 (06): : 501 - 506
  • [10] CLUSTERING USING A SIMILARITY MEASURE BASED ON SHARED NEAR NEIGHBORS
    JARVIS, RA
    PATRICK, EA
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1973, C-22 (11) : 1025 - 1034