Analysis of large screening data sets via adaptively grown phylogenetic-like trees

被引:45
作者
Nicolaou, CA [1 ]
Tamura, SY [1 ]
Kelley, BP [1 ]
Bassett, SI [1 ]
Nutt, RF [1 ]
机构
[1] Bioreason Inc, Santa Fe, NM 87501 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2002年 / 42卷 / 05期
关键词
D O I
10.1021/ci010244i
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
As the use of high-throughput screening systems becomes more routine in the drug discovery process, there is an increasing need for fast and reliable analysis of the massive amounts of the resulting data. At the forefront of the methods used is data reduction, often assisted by cluster analysis. Activity thresholds reduce the data set under investigation to manageable sizes while clustering enables the detection of natural groups in that reduced subset, thereby revealing families of compounds that exhibit increased activity toward a specific biological target. The above process, designed to handle primarily data sets of sizes much smaller than the ones currently produced by high-throughput screening systems, has become one of the main bottlenecks of the modern drug discovery process. In addition to being fragmented and heavily dependent on human experts, it also ignores all screening information related to compounds with activity less than the threshold chosen and thus, in the best case, can only hope to discover a subset of the knowledge available in the screening data sets. To address the deficiencies of the current screening data analysis process the authors have developed a new method that analyzes thoroughly large screening data sets. In this report we describe in detail this new approach and present its main differences with the methods currently in use. Further, we analyze a well-known, publicly available data set using the proposed method. Our experimental results show that the proposed method can improve significantly both the ease of extraction and amount of knowledge discovered from screening data sets.
引用
收藏
页码:1069 / 1079
页数:11
相关论文
共 33 条
[1]   CLUSTERING OF CHEMICAL STRUCTURES ON THE BASIS OF 2-DIMENSIONAL SIMILARITY MEASURES [J].
BARNARD, JM ;
DOWNS, GM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (06) :644-649
[2]   Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03) :572-584
[3]   Recursive partitioning analysis of a large structure-activity data set using three-dimensional descriptors [J].
Chen, X ;
Rusinko, A ;
Young, SS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (06) :1054-1062
[4]   Automated pharmacophore identification for large chemical data sets [J].
Chen, X ;
Rusinko, A ;
Tropsha, A ;
Young, SS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (05) :887-896
[5]   SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA [J].
DOWNS, GM ;
WILLETT, P ;
FISANICK, W .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (05) :1094-1102
[6]  
Engels MFM, 2000, J CHEM INF COMP SCI, V40, P241, DOI 10.1021/ci990435
[7]   THE CHEMICAL ABSTRACTS SERVICE GENERIC CHEMICAL (MARKUSH) STRUCTURE STORAGE AND RETRIEVAL CAPABILITY .1. BASIC CONCEPTS [J].
FISANICK, W .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1990, 30 (02) :145-154
[8]   On the properties of bit string-based measures of chemical similarity [J].
Flower, DR .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (03) :379-386
[9]  
FRANK RB, 2001, J CHEM INF COMP SCI, V41, P830
[10]   Binary quantitative structure-activity relationship (QSAR) analysis of estrogen receptor ligands [J].
Gao, H ;
Williams, C ;
Labute, P ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (01) :164-168