A fast algorithm for selecting sets of dissimilar molecules from large chemical databases

被引:111
作者
Holliday, JD
Ranade, SS
Willett, P
机构
[1] UNIV SHEFFIELD,KREBS INST BIOMOLEC RES,SHEFFIELD S10 2TN,S YORKSHIRE,ENGLAND
[2] UNIV SHEFFIELD,DEPT INFORMAT STUDIES,SHEFFIELD S10 2TN,S YORKSHIRE,ENGLAND
来源
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS | 1995年 / 14卷 / 06期
关键词
algorithmic complexity; compound selection; dissimilarity selection; random screening; similarity coefficient;
D O I
10.1002/qsar.19950140602
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Current algorithms for the selection of a set of n dissimilar molecules from a dataset of N molecules have an expected time complexity of O(n(2)N). This paper describes an improved algorithm that has an expected time complexity of O(nN) and that will identify exactly the same set of molecules as the normal algorithm if the cosine coefficient is used for the calculation of the inter-molecular (dis)similarities. The algorithm is applicable to any type of representation that characterises a molecule by a set of attribute values and to any procedure that involves calculating a sum of inter-molecular similarities. It is also both more effective and more efficient than our implementation of a genetic algorithm for the selection of maximally-dissimilar sets of molecules.
引用
收藏
页码:501 / 506
页数:6
相关论文
共 25 条
[1]   CLUSTERING OF CHEMICAL STRUCTURES ON THE BASIS OF 2-DIMENSIONAL SIMILARITY MEASURES [J].
BARNARD, JM ;
DOWNS, GM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (06) :644-649
[2]  
BAWDEN D, 1990, CONCEPTS AND APPLICATIONS OF MOLECULAR SIMILARITY, P65
[3]  
BAWDEN D, 1993, CHEM STRUCTURES, V2, P383
[4]   INCLUSION OF RELEVANCE INFORMATION IN THE TERM DISCRIMINATION MODEL [J].
BIRU, T ;
ELHAMDOUCHI, A ;
REES, RS ;
WILLETT, P .
JOURNAL OF DOCUMENTATION, 1989, 45 (02) :85-109
[5]   APPLICATION OF GENETIC ALGORITHMS IN MOLECULAR MODELING [J].
BRODMEIER, T ;
PRETSCH, E .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 1994, 15 (06) :588-595
[6]  
CUTTING DR, 1992, P ANN INT C RES DEV, V15, P318
[7]  
Davis L., 1991, HDB GENETIC ALGORITH
[8]  
Downs G. M., 1994, ADV COMPUTER ASSISTE, P111
[9]  
DOWNS GM, IN PRESS REV COMPUTA
[10]   AN IMPROVED ALGORITHM FOR THE CALCULATION OF EXACT TERM DISCRIMINATION VALUES [J].
ELHAMDOUCHI, A ;
WILLETT, P .
INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (01) :17-22