Rational combinatorial library design. 3. Simulated annealing guided evaluation (SAGE) of molecular diversity: A novel computational tool for universal library design and database mining

被引:43
作者
Zheng, WF
Cho, SJ
Waller, CL
Tropsha, A [1 ]
机构
[1] Univ N Carolina, Sch Pharm, Div Med Chem, Lab Mol Modeling, Chapel Hill, NC 27599 USA
[2] OSI Pharmaceut Inc, Durham, NC 27707 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1999年 / 39卷 / 04期
关键词
D O I
10.1021/ci980103p
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We have developed a novel method for molecular diversity sampling called SAGE (simulated annealing guided evaluation of molecular diversity). Compounds in chemical databases or virtual combinatorial libraries are conventionally represented as points in multidimensional descriptor space. The SAGE algorithm selects a desired number of optimally diverse points (compounds) from a database. The diversity of a subset of points is measured by a specially designed diversity function, and the most diverse subset is selected using Simulated Annealing (SA) as the optimization tool. Application of SAGE to two simulated data sets of randomly distributed points in two-dimensional space afforded diverse and representative selection as judged by visual inspection. SAGE was also applied, in comparison with random sampling, to two other simulated data sets with points distributed among many clusters. We found that SAGE sampling covered significantly more clusters than the random sampling. By defining a fraction of data points as active, we also compared SAGE with random sampling in terms of hit rates. We showed that when the percentage of active points was low, the hit rates obtained by SAGE were always higher than those obtained by random sampling. When the percentage of active points was high, the performance of SAGE, in terms of individual hit rates, depended upon the data structure. However, in all cases, SAGE performed better than random sampling when cluster hit rates were used as the criterion.
引用
收藏
页码:738 / 746
页数:9
相关论文
共 59 条
[51]   CLUSTER-ANALYSIS BY SIMULATED ANNEALING [J].
SUN, LX ;
XIE, YL ;
SONG, XH ;
WANG, JH ;
YU, RQ .
COMPUTERS & CHEMISTRY, 1994, 18 (02) :103-108
[52]   Rapid quantification of molecular diversity for selective database acquisition [J].
Turner, DB ;
Tyrrell, SM ;
Willett, P .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :18-22
[53]   Combinatorial chemistry and molecular diversity. An overview [J].
Warr, WA .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :134-140
[54]   CLUSTERING A LARGE NUMBER OF COMPOUNDS .2. USING THE CONNECTION MACHINE [J].
WHALEY, R ;
HODES, L .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1991, 31 (02) :345-347
[55]   IMPLEMENTATION OF NON-HIERARCHICAL CLUSTER-ANALYSIS METHODS IN CHEMICAL INFORMATION-SYSTEMS - SELECTION OF COMPOUNDS FOR BIOLOGICAL TESTING AND CLUSTERING OF SUBSTRUCTURE SEARCH OUTPUT [J].
WILLETT, P ;
WINTERMAN, V ;
BAWDEN, D .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1986, 26 (03) :109-118
[56]  
WILLETT P, 1990, CONCEPTS AND APPLICATIONS OF MOLECULAR SIMILARITY, P43
[57]   Computational screening of combinatorial libraries [J].
Zheng, Q ;
Kyle, DJ .
BIOORGANIC & MEDICINAL CHEMISTRY, 1996, 4 (05) :631-638
[58]  
ZHENG W, 1997, 213 NAT M AM CHEM SO
[59]   Rational combinatorial library design. 1. Focus-2D: A new approach to the design of targeted combinatorial chemical libraries [J].
Zheng, WF ;
Cho, SJ ;
Tropsha, A .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (02) :251-258