LINGO, an efficient holographic text based method to calculate biophysical properties and intermolecular similarities

被引:141
作者
Vidal, D
Thormann, M
Pons, M
机构
[1] Morphochem AG, D-81379 Munich, Germany
[2] Parc Cientif Barcelona, Lab Biomol NMR, Barcelona 08028, Spain
[3] Univ Barcelona, E-08028 Barcelona, Spain
关键词
D O I
10.1021/ci0496797
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
SMILES strings are the most compact text based molecular representations. Implicitly they contain the information needed to compute all kinds of molecular structures and, thus, molecular properties derived from these structures. We show that this implicit information can be accessed directly at SMILES string level without the need to apply explicit time-consuming conversion of the SMILES strings into molecular graphs or 3D structures with subsequent 2D or 3D QSPR calculations. Our method is based on the fragmentation of SMILES strings into overlapping substrings of a defined size that we call LINGOs. The integral set of LINGOs derived from a given SMILES string, the LINGO profile, is a hologram of the SMILES representation of the molecule described. LINGO profiles provide input for QSPR models and the calculation of intermolecular similarities at very low computational cost. The octanol/water partition coefficient (LlogP) QSPR model achieved a correlation coefficient R-2=0.93, a root-mean-square error RRMS=0.49 log units, a goodness of prediction correlation coefficient Q(2)=0.89 and a QRMS=0.61 log units. The intrinsic aqueous solubility (LlogS) QSPR model achieved correlation coefficient values of R-2=0.91, Q(2)=0.82, and RRMS=0.60 and QRMS=0.89 log units. Integral Tanimoto coefficients computed from LINGO profiles provided sharp discrimination between random and bioisoster pairs extracted from Accelrys Bioster Database. Average similarities (LINGOsim) were 0.07 for the random pairs and 0.36 for the bioisosteric pairs.
引用
收藏
页码:386 / 393
页数:8
相关论文
共 26 条
[1]   Advances in diversity profiling and combinatorial series design [J].
Agrafiotis, DK ;
Myslik, JC ;
Salemme, FR .
MOLECULAR DIVERSITY, 1998, 4 (01) :1-22
[2]  
[Anonymous], ACS S SERIES
[3]   SYBYL line notation (SLN): A versatile language for chemical structure representation [J].
Ash, S ;
Cline, MA ;
Homer, RW ;
Hurst, T ;
Smith, GB .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :71-79
[4]  
Brown RD, 1997, PERSPECT DRUG DISCOV, V7-8, P31
[5]   Search for predictive generic model of aqueous solubility using Bayesian neural nets [J].
Bruneau, P .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (06) :1605-1616
[6]   Predicting ADME properties in silico:: methods and models [J].
Butina, D ;
Segall, MD ;
Frankcombe, K .
DRUG DISCOVERY TODAY, 2002, 7 (11) :S83-S88
[7]  
Eriksson L, 2001, MULTI MEGAVARIATE AN
[8]   On the properties of bit string-based measures of chemical similarity [J].
Flower, DR .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (03) :379-386
[9]   Aqueous solubility prediction of drugs based on molecular topology and neural network modeling [J].
Huuskonen, J ;
Salo, M ;
Taskinen, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (03) :450-456
[10]   The many roles of computation in drug discovery [J].
Jorgensen, WL .
SCIENCE, 2004, 303 (5665) :1813-1818