Analysis of large screening data sets via adaptively grown phylogenetic-like trees

被引:45
作者
Nicolaou, CA [1 ]
Tamura, SY [1 ]
Kelley, BP [1 ]
Bassett, SI [1 ]
Nutt, RF [1 ]
机构
[1] Bioreason Inc, Santa Fe, NM 87501 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2002年 / 42卷 / 05期
关键词
D O I
10.1021/ci010244i
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
As the use of high-throughput screening systems becomes more routine in the drug discovery process, there is an increasing need for fast and reliable analysis of the massive amounts of the resulting data. At the forefront of the methods used is data reduction, often assisted by cluster analysis. Activity thresholds reduce the data set under investigation to manageable sizes while clustering enables the detection of natural groups in that reduced subset, thereby revealing families of compounds that exhibit increased activity toward a specific biological target. The above process, designed to handle primarily data sets of sizes much smaller than the ones currently produced by high-throughput screening systems, has become one of the main bottlenecks of the modern drug discovery process. In addition to being fragmented and heavily dependent on human experts, it also ignores all screening information related to compounds with activity less than the threshold chosen and thus, in the best case, can only hope to discover a subset of the knowledge available in the screening data sets. To address the deficiencies of the current screening data analysis process the authors have developed a new method that analyzes thoroughly large screening data sets. In this report we describe in detail this new approach and present its main differences with the methods currently in use. Further, we analyze a well-known, publicly available data set using the proposed method. Our experimental results show that the proposed method can improve significantly both the ease of extraction and amount of knowledge discovered from screening data sets.
引用
收藏
页码:1069 / 1079
页数:11
相关论文
共 33 条
[11]   Combinatorial preferences affect molecular similarity/diversity calculations using binary fingerprints and Tanimoto coefficients [J].
Godden, JW ;
Xue, L ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (01) :163-166
[12]   RHO-SIGMA-PI ANALYSIS . METHOD FOR CORRELATION OF BIOLOGICAL ACTIVITY + CHEMICAL STRUCTURE [J].
HANSCH, C ;
FUJITA, T .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1964, 86 (08) :1616-&
[13]  
Jain A.K., 1998, ALGORITHMS CLUSTERIN
[14]   CLUSTERING USING A SIMILARITY MEASURE BASED ON SHARED NEAR NEIGHBORS [J].
JARVIS, RA ;
PATRICK, EA .
IEEE TRANSACTIONS ON COMPUTERS, 1973, C-22 (11) :1025-1034
[15]  
Kohonen T., 1997, Self-organizing Maps, V2nd ed.
[16]   Ties in proximity and clustering compounds [J].
MacCuish, J ;
Nicolaou, C ;
MacCuish, NE .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (01) :134-146
[17]   Comparing 3D pharmacophore triplets and 2D fingerprints for selecting diverse compound subsets [J].
Matter, H ;
Pötter, T .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (06) :1211-1225
[18]   Rational screening set design and compound selection: Cascaded clustering [J].
Menard, PR ;
Lewis, RA ;
Mason, JS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (03) :497-505
[19]   Chemistry space metrics in diversity analysis, library design, and compound selection [J].
Menard, PR ;
Mason, JS ;
Morize, I ;
Bauerschmidt, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (06) :1204-1213
[20]  
NICOLAOU CA, 2000, P 13 EUR S QUANT STR