Comparison of 2D fingerprint types and hierarchy level selection methods for structural grouping using Ward's clustering

被引:71
作者
Wild, DJ [1 ]
Blankley, CJ [1 ]
机构
[1] Warner Lambert Parke Davis, Parke Davis Pharmaceut Res Div, Ann Arbor, MI 48105 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2000年 / 40卷 / 01期
关键词
D O I
10.1021/ci990086j
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Four different two-dimensional fingerprint types (MACCS, Unity, BCI, and Daylight) and nine methods of selecting optimal cluster levels from the output of a hierarchical clustering algorithm were evaluated for their ability to select clusters that represent chemical series present in some typical examples of chemical compound data sets. The methods were evaluated using a Ward's clustering algorithm on subsets of the publicly available National Cancer Institute HIV data set, as well as with compounds from our corporate data set. We make a number of observations and recommendations about the choice of fingerprint type and cluster level selection methods for use in this type of clustering.
引用
收藏
页码:155 / 162
页数:8
相关论文
共 20 条
[1]   MEASURING POWER OF HIERARCHICAL CLUSTER-ANALYSIS [J].
BAKER, FB ;
HUBERT, LJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1975, 70 (349) :31-38
[2]   CLUSTERING OF CHEMICAL STRUCTURES ON THE BASIS OF 2-DIMENSIONAL SIMILARITY MEASURES [J].
BARNARD, JM ;
DOWNS, GM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (06) :644-649
[3]   Molecular diversity and representativity in chemical databases [J].
Bayada, DM ;
Hamersma, H ;
van Geerestein, VJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (01) :1-10
[4]   The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :1-9
[5]   Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03) :572-584
[6]  
Calinski T., 1974, COMMUN STAT-THEOR M, V3, P1, DOI DOI 10.1080/03610927408827101
[7]  
DOWNS GM, 1991, EURO CH ENV, V2, P247
[8]   GENERAL STATISTICAL FRAMEWORK FOR ASSESSING CATEGORICAL CLUSTERING IN FREE-RECALL [J].
HUBERT, LJ ;
LEVIN, JR .
PSYCHOLOGICAL BULLETIN, 1976, 83 (06) :1072-1080
[9]   CLUSTERING USING A SIMILARITY MEASURE BASED ON SHARED NEAR NEIGHBORS [J].
JARVIS, RA ;
PATRICK, EA .
IEEE TRANSACTIONS ON COMPUTERS, 1973, C-22 (11) :1025-1034
[10]   An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally related subfamilies [J].
Kelley, LA ;
Gardner, SP ;
Sutcliffe, MJ .
PROTEIN ENGINEERING, 1996, 9 (11) :1063-1065