Optimal Sparse Descriptor Selection for QSAR Using Bayesian Methods

被引:65
作者
Burden, F. R. [1 ]
Winkler, D. A.
机构
[1] CSIRO Mol & Hlth Technol, Clayton, Vic 3168, Australia
来源
QSAR & COMBINATORIAL SCIENCE | 2009年 / 28卷 / 6-7期
关键词
Medicinal chemistry; Structure-activity relationships; Feature selection; Bayesian methods; Descriptors; ARTIFICIAL NEURAL-NETWORKS; SUPPORT VECTOR MACHINE; GENETIC ALGORITHMS; DRUG DISCOVERY; MOLECULAR DESCRIPTORS; VARIABLE SELECTION; MODELS; VALIDATION; PREDICTION; DOMAIN;
D O I
10.1002/qsar.200810173
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Choosing a set of molecular descriptors (features) that is most relevant to a given biological response variable is a very important problem in QSAR that has not be solved in an optimal robust way. It is an interesting and important class of mathematical problems, where the number of variables greatly outweighs the number of observations (grossly underdetermined systems). We have used two Bayesian approaches to carry out this task using a suite of QSAR data sets. We employed a specialized sparse Bayesian feature reduction method based on an EM algorithm with a Laplacian prior to select a small set of the most relevant descriptors for modeling the response variables from a much larger pool of possibilities. Having chosen the optimum descriptors in a supervised manner, we used a Bayesian regularized neural network to carry out nonlinear regression and derive robust parsimonious QSAR models for five drug data sets. Models were validated using independent test sets, and results compared with other contemporary descriptor selection methods. Issues around validating small QSAR data sets were also discussed in detail. The sparse feature selection algorithm proved to be an excellent, robust method for selecting descriptors for QSAR models, as it is supervised (descriptors chosen in a context-dependent manner), parsimonious (models not overly complex), and inherently interpretable. Coupled to a robust parsimonious nonlinear modeling method such as the Bayesian regularized neural net, the combination provides a means of optimally modeling the data, and allowing interpretation of the model in terms of the most relevant descriptors.
引用
收藏
页码:645 / 653
页数:9
相关论文
共 52 条
[1]   On the use of neural network ensembles in QSAR and QSPR [J].
Agrafiotis, DK ;
Cedeño, W ;
Lobanov, VS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (04) :903-911
[2]  
Andrea TA, 1995, ACS SYM SER, V606, P282
[3]  
[Anonymous], 2000, WILEY VCH
[4]   The recent trend in QSAR modeling - Variable selection and 3D-QSAR methods [J].
Arakawa, Masamoto ;
Hasegawa, Kiyoshi ;
Funatsu, Kimito .
CURRENT COMPUTER-AIDED DRUG DESIGN, 2007, 3 (04) :254-262
[5]   Predictivity of QSAR [J].
Benigni, Romualdo ;
Bossa, Cecilia .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (05) :971-980
[6]   Contemporary QSAR classifiers compared [J].
Bruce, Craig L. ;
Melville, James L. ;
Pickett, Stephen D. ;
Hirst, Jonathan D. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (01) :219-227
[7]   Robust QSAR models using Bayesian regularized neural networks [J].
Burden, FR ;
Winkler, DA .
JOURNAL OF MEDICINAL CHEMISTRY, 1999, 42 (16) :3183-3187
[8]   Use of automatic relevance determination in QSAR studies using Bayesian neural networks [J].
Burden, FR ;
Ford, MG ;
Whitley, DC ;
Winkler, DA .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (06) :1423-1430
[9]   Preface [J].
Caron, Giulia .
MINI-REVIEWS IN MEDICINAL CHEMISTRY, 2003, 3 (08)
[10]   COMPARATIVE MOLECULAR-FIELD ANALYSIS (COMFA) .1. EFFECT OF SHAPE ON BINDING OF STEROIDS TO CARRIER PROTEINS [J].
CRAMER, RD ;
PATTERSON, DE ;
BUNCE, JD .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1988, 110 (18) :5959-5967