Best Practices for QSAR Model Development, Validation, and Exploitation

被引:1399
作者
Tropsha, Alexander [1 ]
机构
[1] Univ N Carolina, Lab Mol Modeling & Carolina, Ctr Exploratory Cheminformat Res, UNC Eshelman Sch Pharm, Chapel Hill, NC 27599 USA
关键词
QSAR modeling; Model validation; Virtual screening; Drug discovery; QUANTITATIVE STRUCTURE-ACTIVITY; K-NEAREST-NEIGHBOR; STRUCTURE-TOXICITY RELATIONSHIPS; APPLICABILITY DOMAINS; VARIABLE SELECTION; TETRAHYMENA-PYRIFORMIS; COMBINATORIAL QSAR; DATABASE; PREDICTION; IDENTIFICATION;
D O I
10.1002/minf.201000061
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
After nearly five decades "in the making", QSAR modeling has established itself as one of the major computational molecular modeling methodologies. As any mature research discipline, QSAR modeling can be characterized by a collection of well defined protocols and procedures that enable the expert application of the method for exploring and exploiting ever growing collections of biologically active chemical compounds. This review examines most critical QSAR modeling routines that we regard as best practices in the field. We discuss these procedures in the context of integrative predictive QSAR modeling workflow that is focused on achieving models of the highest statistical rigor and external predictive power. Specific elements of the workflow consist of data preparation including chemical structure (and when possible, associated biological data) curation, outlier detection, dataset balancing, and model validation. We especially emphasize procedures used to validate models, both internally and externally, as well as the need to define model applicability domains that should be used when models are employed for the prediction of external compounds or compound libraries. Finally, we present several examples of successful applications of QSAR models for virtual screening to identify experimentally confirmed hits.
引用
收藏
页码:476 / 488
页数:13
相关论文
共 72 条
[1]  
[Anonymous], 2010, OpenEye Scientific Software, I
[2]   Chemistry-toxicity relationships for the effects of Di-and trihydroxybenzenes to Tetrahymena pyriformis [J].
Aptula, AO ;
Roberts, DW ;
Cronin, MTD ;
Schultz, TW .
CHEMICAL RESEARCH IN TOXICOLOGY, 2005, 18 (05) :844-854
[3]  
*CHEM COMP GROUP, 2010, MOE
[4]  
*CHEMAXON, 2010, CHEMAXON JCHEM
[5]   Rational combinatorial library design. 2. Rational design of targeted combinatorial peptide libraries using chemical similarity probe and the inverse QSAR approaches [J].
Cho, SJ ;
Zheng, WF ;
Tropsha, A .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (02) :259-268
[6]   How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR) [J].
Dearden, J. C. ;
Cronin, M. T. D. ;
Kaiser, K. L. E. .
SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2009, 20 (3-4) :241-266
[7]   QSAR: dead or alive? [J].
Doweyko, Arthur M. .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2008, 22 (02) :81-89
[8]   Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs [J].
Eriksson, L ;
Jaworska, J ;
Worth, AP ;
Cronin, MTD ;
McDowell, RM ;
Gramatica, P .
ENVIRONMENTAL HEALTH PERSPECTIVES, 2003, 111 (10) :1361-1375
[9]   Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research [J].
Fourches, Denis ;
Muratov, Eugene ;
Tropsha, Alexander .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (07) :1189-1204
[10]   Cheminformatics Analysis of Assertions Mined from Literature That Describe Drug-Induced Liver Injury in Different Species [J].
Fourches, Denis ;
Barnes, Julie C. ;
Day, Nicola C. ;
Bradley, Paul ;
Reed, Jane Z. ;
Tropsha, Alexander .
CHEMICAL RESEARCH IN TOXICOLOGY, 2010, 23 (01) :171-183