Analysis of large screening data sets via adaptively grown phylogenetic-like trees

被引:45
作者
Nicolaou, CA [1 ]
Tamura, SY [1 ]
Kelley, BP [1 ]
Bassett, SI [1 ]
Nutt, RF [1 ]
机构
[1] Bioreason Inc, Santa Fe, NM 87501 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2002年 / 42卷 / 05期
关键词
D O I
10.1021/ci010244i
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
As the use of high-throughput screening systems becomes more routine in the drug discovery process, there is an increasing need for fast and reliable analysis of the massive amounts of the resulting data. At the forefront of the methods used is data reduction, often assisted by cluster analysis. Activity thresholds reduce the data set under investigation to manageable sizes while clustering enables the detection of natural groups in that reduced subset, thereby revealing families of compounds that exhibit increased activity toward a specific biological target. The above process, designed to handle primarily data sets of sizes much smaller than the ones currently produced by high-throughput screening systems, has become one of the main bottlenecks of the modern drug discovery process. In addition to being fragmented and heavily dependent on human experts, it also ignores all screening information related to compounds with activity less than the threshold chosen and thus, in the best case, can only hope to discover a subset of the knowledge available in the screening data sets. To address the deficiencies of the current screening data analysis process the authors have developed a new method that analyzes thoroughly large screening data sets. In this report we describe in detail this new approach and present its main differences with the methods currently in use. Further, we analyze a well-known, publicly available data set using the proposed method. Our experimental results show that the proposed method can improve significantly both the ease of extraction and amount of knowledge discovered from screening data sets.
引用
收藏
页码:1069 / 1079
页数:11
相关论文
共 33 条
[21]  
RHEE MA, 2001, J COMB CHEM, V3, P267
[22]   Bit-string methods for selective compound acquisition [J].
Rhodes, N ;
Willett, P ;
Dunbar, JB ;
Humblet, C .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (02) :210-214
[23]   LeadScope: Software for exploring large sets of screening data [J].
Roberts, G ;
Myatt, GJ ;
Johnson, WP ;
Cross, KP ;
Blower, PE .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (06) :1302-1314
[24]   Analysis of a large structure/biological activity data set using recursive partitioning [J].
Rusinko, A ;
Farmen, MW ;
Lambert, CG ;
Brown, PL ;
Young, SS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (06) :1017-1026
[25]  
Sadowski J, 1995, ANGEW CHEM INT EDIT, V34, P2674
[26]   ENHANCING THE DIVERSITY OF A CORPORATE DATABASE USING CHEMICAL DATABASE CLUSTERING AND ANALYSIS [J].
SHEMETULSKIS, NE ;
DUNBAR, JB ;
DUNBAR, BW ;
MORELAND, DW ;
HUMBLET, C .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1995, 9 (05) :407-416
[27]   Application of nearest-neighbor and cluster analyses in pharmaceutical lead discovery [J].
Stanton, DT ;
Morris, TW ;
Roychoudhury, S ;
Parker, CN .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (01) :21-27
[28]   THE DETERMINATION OF THE LARGEST SUBSTRUCTURES WITH A GENETIC ALGORITHM - APPLICATION IN SYNTHESIS DESIGN AND IN STRUCTURAL-ANALYSES OF BIOLOGICAL-ACTIVITY [J].
WAGENER, M ;
GASTEIGER, J .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION IN ENGLISH, 1994, 33 (11) :1189-1192
[29]   NEW SOLUBLE-FORMAZAN ASSAY FOR HIV-1 CYTOPATHIC EFFECTS - APPLICATION TO HIGH-FLUX SCREENING OF SYNTHETIC AND NATURAL-PRODUCTS FOR AIDS-ANTIVIRAL ACTIVITY [J].
WEISLOW, OS ;
KISER, R ;
FINE, DL ;
BADER, J ;
SHOEMAKER, RH ;
BOYD, MR .
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1989, 81 (08) :577-586
[30]   Comparison of 2D fingerprint types and hierarchy level selection methods for structural grouping using Ward's clustering [J].
Wild, DJ ;
Blankley, CJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (01) :155-162