On the interpretation and interpretability of quantitative structure-activity relationship models

被引:71
作者
Guha, Rajarshi [1 ]
机构
[1] Indiana Univ, Sch Informat, Bloomington, IN 47408 USA
关键词
Quantitative structure-activity relationship (QSAR); Interpretation; Linear regression; Partial least squares (PLS); Neural network;
D O I
10.1007/s10822-008-9240-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The goal of a quantitative structure-activity relationship (QSAR) model is to encode the relationship between molecular structure and biological activity or physical property. Based on this encoding, such models can be used for predictive purposes. Assuming the use of relevant and meaningful descriptors, and a statistically significant model, extraction of the encoded structure-activity relationships (SARs) can provide insight into what makes a molecule active or inactive. Such analyses by QSAR models are useful in a number of scenarios, such as suggesting structural modifications to enhance activity, explanation of outliers and exploratory analysis of novel SARs. In this paper we discuss the need for interpretation and an overview of the factors that affect interpretability of QSAR models. We then describe interpretation protocols for different types of models, highlighting the different types of interpretations, ranging from very broad, global, trends to very specific, case-by-case, descriptions of the SAR, using examples from the training set. Finally, we discuss a number of case studies where workers have provide some form of interpretation of a QSAR model.
引用
收藏
页码:857 / 871
页数:15
相关论文
共 117 条
[1]   Feature selection for structure-activity correlation using binary particle swarms [J].
Agrafiotis, DK ;
Cedeño, W .
JOURNAL OF MEDICINAL CHEMISTRY, 2002, 45 (05) :1098-1107
[2]   QSAR studies on antimalarial substituted phenyl analogues and their Nω-oxides [J].
Agrawal, VK ;
Sharma, R ;
Khadikar, PV .
BIOORGANIC & MEDICINAL CHEMISTRY, 2002, 10 (05) :1361-1366
[3]  
[Anonymous], 1998, Encyclopedia of Biostatistics
[4]   QSAR study of anti-HIV HEPT analogues based on multi-objective genetic programming and counter-propagation neural network [J].
Arakawa, Masamoto ;
Hasegawa, Kiyoshi ;
Funatsu, Kimito .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 83 (02) :91-98
[5]   Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05) :1708-1718
[6]  
Besalú E, 2001, MATCH-COMMUN MATH CO, P41
[7]   Statistical modeling: The two cultures [J].
Breiman, L .
STATISTICAL SCIENCE, 2001, 16 (03) :199-215
[8]  
BREMSER W, 1978, ANAL CHIM ACTA-COMP, V2, P355
[9]   A novel workflow for the inverse QSPR problem using multiobjective optimization [J].
Brown, Nathan ;
McKay, Ben ;
Gasteiger, Johann .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2006, 20 (05) :333-341
[10]   MOLECULAR-IDENTIFICATION NUMBER FOR SUBSTRUCTURE SEARCHES [J].
BURDEN, FR .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1989, 29 (03) :225-227