Uniform coverage designs for molecule selection

被引:27
作者
Lam, RLH [1 ]
Welch, WJ
Young, SS
机构
[1] GlaxoSmithKline Inc, Biomed Data Sci, Mississauga, ON L5N 6L4, Canada
[2] Univ Waterloo, Dept Stat & Actuarial Sci, Waterloo, ON N2L 3G1, Canada
[3] GlaxoSmithKline Inc, Stat Res Unit, Res Triangle Pk, NC 27709 USA
基金
加拿大自然科学与工程研究理事会;
关键词
binning; drug discovery; exchange algorithm; high throughput screening; projection; space-filling design;
D O I
10.1198/004017002317375055
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In screening for drug discovery, chemists often select a large subset of molecules from a very large database (e.g., select 1,000 molecules from 100,000). To generate diverse leads for drug optimization, highly active compounds in several structurally different chemical classes are sought. Molecules can be characterized by numerical descriptors, and the chosen subset should cover the descriptor space or subspaces formed by several descriptors. We propose a method that concentrates on low-dimensional subspaces, a criterion for uniformity of coverage, and a fast exchange algorithm to optimize the criterion. These methods are illustrated by using a National Cancer Institute database.
引用
收藏
页码:99 / 109
页数:11
相关论文
共 23 条
[1]  
Box GEP., 1978, Statistics for experimenters
[2]   MOLECULAR-IDENTIFICATION NUMBER FOR SUBSTRUCTURE SEARCHES [J].
BURDEN, FR .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1989, 29 (03) :225-227
[3]   A COMPARISON OF ALGORITHMS FOR CONSTRUCTING EXACT D-OPTIMAL DESIGNS [J].
COOK, RD ;
NACHTSHEIM, CJ .
TECHNOMETRICS, 1980, 22 (03) :315-324
[4]   Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds [J].
Cummins, DJ ;
Andrews, CW ;
Bentley, JA ;
Cory, M .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (04) :750-763
[5]   Factor-covering designs for testing software [J].
Dalal, SR ;
Mallows, CL .
TECHNOMETRICS, 1998, 40 (03) :234-243
[6]  
Doehlert D.H., 1970, Appl. Stat, V19, P231
[7]   SOME APPLICATION OF NUMBER-THEORETIC METHODS IN STATISTICS [J].
FANG, KT ;
WANG, Y ;
BENTLER, PM .
STATISTICAL SCIENCE, 1994, 9 (03) :416-428
[8]   Analysis of a large structure-activity data set using recursive partitioning [J].
Hawkins, DM ;
Young, SS ;
Rusinko, A .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1997, 16 (04) :296-302
[9]   Experimental designs for selecting molecules from large chemical databases [J].
Higgs, RE ;
Bemis, KG ;
Watson, IA ;
Wikel, JH .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (05) :861-870
[10]   MINIMAX AND MAXIMIN DISTANCE DESIGNS [J].
JOHNSON, ME ;
MOORE, LM ;
YLVISAKER, D .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1990, 26 (02) :131-148