Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers

被引:60
作者
Monge, Aurelien [1 ]
Arrault, Alban [1 ]
Marot, Christophe [1 ]
Morin-Allory, Luc [1 ]
机构
[1] Univ Orleans, UMR 6005, CNRS, Inst Chim Organ & Analyt, F-45067 Orleans 2, France
关键词
chemical databases; chemoinformatics; diversity; drug-like; lead-like; screening;
D O I
10.1007/s11030-006-9033-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The data for 3.8 million compounds from structural databases of 32 providers were gathered and stored in a single chemical database. Duplicates are removed using the IUPAC International Chemical Identifier. After this, 2.6 million compounds remain. Each database and the final one were studied in term of uniqueness, diversity, frameworks, 'drug-like' and 'lead-like' properties. This study also shows that there are more than 97 000 frameworks in the database. It contains 2.1 million 'drug-like' molecules among which, more than one million are 'lead-like'. This study has been carried out using 'ScreeningAssistant', a software dedicated to chemical databases management and screening sets generation. Compounds are stored in a MySQL database and all the operations on this database are carried out by Java code. The druglikeness and leadlikeness are estimated with 'in-house' scores using functions to estimate convenience to properties; unicity using the InChI code and diversity using molecular frameworks and fingerprints. The software has been conceived in order to facilitate the update of the database. 'ScreeningAssistant' is freely available under the GPL license.
引用
收藏
页码:389 / 403
页数:15
相关论文
共 42 条
[1]   Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? [J].
Ajay ;
Walters, WP ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1998, 41 (18) :3314-3324
[2]   Drug-like annotation and duplicate analysis of a 23-supplier chemical database totalling 2.7 million compounds [J].
Baurin, N ;
Baker, R ;
Richardson, C ;
Chen, I ;
Foloppe, N ;
Potter, A ;
Jordan, A ;
Roughley, S ;
Parratt, M ;
Greaney, P ;
Morley, D ;
Hubbard, RE .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (02) :643-651
[3]   The properties of known drugs .1. Molecular frameworks [J].
Bemis, GW ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893
[4]  
BRADLEY MP, 2002, J COMPUT AID MOL DES, V16, P299
[5]   Filtering databases and chemical libraries [J].
Charifson, PS ;
Walters, WP .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2002, 16 (5-6) :311-323
[6]   Computational methods for the prediction of 'drug-likeness' [J].
Clark, DE ;
Pickett, SD .
DRUG DISCOVERY TODAY, 2000, 5 (02) :49-58
[7]   Enhancement of the chemical semantic web through the use of InChI identifiers [J].
Coles, SJ ;
Day, NE ;
Murray-Rust, P ;
Rzepa, HS ;
Zhang, Y .
ORGANIC & BIOMOLECULAR CHEMISTRY, 2005, 3 (10) :1832-1834
[8]   Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds [J].
Cummins, DJ ;
Andrews, CW ;
Bentley, JA ;
Cory, M .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (04) :750-763
[9]   Application and limitations of X-ray crystallographic data in structure-based ligand and drug design [J].
Davis, AM ;
Teague, SJ ;
Kleywegt, GJ .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2003, 42 (24) :2718-2736
[10]   Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties [J].
Ertl, P ;
Rohde, B ;
Selzer, P .
JOURNAL OF MEDICINAL CHEMISTRY, 2000, 43 (20) :3714-3717