Kolmogorov-Smirnov statistic and its application in library design

被引:53
作者
Rassokhin, DN [1 ]
Agrafiotis, DK [1 ]
机构
[1] 3 Dimens Pharmaceut Inc, Exton, PA 19341 USA
关键词
data mining; multiobjective optimization; simulated annealing; synchronous annealing; Komogorov-Smirnov; principal component analysis; nonlinear mapping; molecular descriptor; combinatorial chemistry; combinatorial library; high-throughput screening; molecular diversity; molecular similarity;
D O I
10.1016/S1093-3263(00)00063-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
After several years of frantic development, the dream of an "ideal" library remains elusive. Traditionally, combinatorial chemistry has been used primarily for lead generation, and molecular diversity has been the method of choice for designing and prioritizing experiments. One aspect that often has been overlooked is the drug likeness of the resulting collections. Recently, there have been several attempts to quantify this concept and incorporate it directly into the design process. This article demonstrates the limitations of some conventional methodologies and proposes a new paradigm for experimental design based on the principles of multiobjective optimization. This method allows traditional design objectives such as diversity or similarity to be combined with secondary selection criteria in order to bias the selection toward more pharmacologically relevant regions of chemical space. The method is robust, general, and easily extensible, and it allows the medicinal chemist to create designs that represent the best compromise between several, often conflicting, objectives. Two types of designs are discussed (singles, arrays), and a novel criterion based on the Kolmogorov-Smirnov statistic is proposed as a means to enforce a particular distribution on key molecular properties that are related to drug likeness. The potential of this approach is illustrated in the design of an exploratory library based on the simultaneous optimization of five different parameters. These parameters are combined in a intuitive manner to produce a design that is sufficiently diverse, exhibits a molecular weight and logP profile that is consistent with the respective distributions of known drugs, requires a small number of reagents, and can be synthesized easily in array format using robotic hardware. (C) 2000 by Elsevier Science Inc.
引用
收藏
页码:368 / 382
页数:15
相关论文
共 32 条
[1]  
Agrafiotis D. K., 1998, ENCY COMPUTATIONAL C, V1, P742
[2]   An efficient implementation of distance-based diversity measures based on k-d trees [J].
Agrafiotis, DK ;
Lobanov, VS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (01) :51-58
[3]   Stochastic algorithms for maximizing molecular diversity [J].
Agrafiotis, DK .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (05) :841-851
[4]  
Agrafiotis DK, 1997, PROTEIN SCI, V6, P287
[5]   Advances in diversity profiling and combinatorial series design [J].
Agrafiotis, DK ;
Myslik, JC ;
Salemme, FR .
MOLECULAR DIVERSITY, 1998, 4 (01) :1-22
[6]   On the use of information theory for assessing molecular diversity [J].
Agrafiotis, DK .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (03) :576-580
[7]  
AGRAFIOTIS DK, 1997, Patent No. 5684711
[8]  
AGRAFIOTIS DK, 1999, Patent No. 5901069
[9]  
AGRAFIOTIS DK, 1996, Patent No. 5574656
[10]  
AGRAFIOTIS DK, 1996, 3 EL COMP CHEM C