NIPALSTREE:: A new hierarchical clustering approach for large compound libraries and its application to virtual screening

被引:23
作者
Boecker, Alexander
Schneider, Gisbert
Teckentrup, Andreas
机构
[1] Goethe Univ Frankfurt, Inst Organ Chem & Chem Biol, D-60439 Frankfurt, Germany
[2] Boehringer Ingelheim Pharma GmbH & Co KG, Dept Lead Discovery, D-88397 Biberach, Germany
关键词
D O I
10.1021/ci050541d
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
A hierarchical clustering algorithm - NIPALSTREE - was developed that is able to analyze large data sets in high-dimensional space. The result can be displayed as a dendrogram. At each tree level the algorithm projects a data set via principle component analysis onto one dimension. The data set is sorted according to this one dimension and split at the median position. To avoid distortion of clusters at the median position, the algorithm identifies a potentially more suited split point left or right of the median. The procedure is recursively applied on the resulting subsets until the maximal distance between cluster members exceeds a user-defined threshold. The approach was validated in a retrospective screening study for angiotensin converting enzyme (ACE) inhibitors. The resulting clusters were assessed for their purity and enrichment in actives belonging to this ligand class. Enrichment was observed in individual branches of the dendrogram. In further retrospective virtual screening studies employing the MDL Drug Data Report (MDDR), COBRA, and the SPECS catalog, NIPALSTREE was compared with the hierarchical k-means clustering approach. Results show that both algorithms can be used in the context of virtual screening. Intersecting the result lists obtained with both algorithms improved enrichment factors while losing only few chemotypes.
引用
收藏
页码:2220 / 2229
页数:10
相关论文
共 35 条
[1]   ACE revisited: A new target for structure-based drug design [J].
Acharya, KR ;
Sturrock, ED ;
Riordan, JF ;
Ehlers, MRW .
NATURE REVIEWS DRUG DISCOVERY, 2003, 2 (11) :891-902
[2]   Integration of virtual and high-throughput screening [J].
Bajorath, F .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (11) :882-894
[3]  
BARNARD JM, 2004, 3 JOINT SHEFF C CHEM
[4]   A hierarchical clustering approach for large compound libraries [J].
Böcker, A ;
Derksen, S ;
Schmidt, E ;
Teckentrup, A ;
Schneider, G .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (04) :807-815
[5]   Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03) :572-584
[6]   Algorithm5: A technique for fuzzy similarity clustering of chemical inventories [J].
Doman, TN ;
Cibulskis, JM ;
Cibulskis, MJ ;
McCray, PD ;
Spangler, DP .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (06) :1195-1204
[7]  
Duda R. O., 2000, PATTERN CLASSIFICATI
[8]  
Godden JW, 2000, J MOL GRAPH MODEL, V18, P73
[9]   Differential shannon entropy as a sensitive measure of differences in database variability of molecular descriptors [J].
Godden, JW ;
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (04) :1060-1066
[10]  
Gohlke H, 2002, ANGEW CHEM INT EDIT, V41, P2645