Evolving interpretable structure - Activity relationship models. 2. Using multiobjective optimization to derive multiple models

被引:17
作者
Birchall, Kristian [1 ]
Gillet, Valerie J. [1 ]
Harper, Gavin [2 ]
Pickentt, Stephen D. [2 ]
机构
[1] Univ Sheffield, Dept Informat Studies, Sheffield S1 4DP, S Yorkshire, England
[2] GlaxoSmithKline Inc, Med Res Ctr, Stevenage SG1 2NY, Herts, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1021/ci800051h
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
A multiobjective evolutionary algorithm (MOEA) is described for evolving multiple structure-activity relationships (SARs). The SARs are encoded in easy-to-interpret reduced graph queries which describe features that are preferentially present in active compounds compared to inactives. The MOEA addresses a limitation associated with many machine learning methods; that is, the inherent tradeoff that exists in recall and precision which is usually handled by combining the two objectives into a single measure with a consequent loss of control. By simultaneously optimizing recall and precision, the MOEA generates a family of SARs that lie on the precision-recall (PR) curve. The user is then able to select a query with an appropriate balance in the two objectives: for example, a low recall-high precision query may be preferred when establishing the SAR, whereas a high recall-low precision query may be more appropriate in a virtual screening context. Each query on the PR curve aims at capturing the structure -activity information into a single representation, and each can be considered as an alternative (equally valid) solution. We then investigate combining individual queries into teams with the aim of capturing multiple SARs that may exist in a data set, for example, as is commonly seen in high-throughput screening data sets. Team formation is carried out iteratively as a postprocessing step following the evolution of the individual queries. The inclusion of uniqueness as a third objective within the MOEA provides an effective way of ensuring the queries are complementary in the active compounds they describe. Substantial improvements in both recall and precision are seen for some data sets. Furthermore, the resulting queries provide more detailed structure-activity information than is present in a single query.
引用
收藏
页码:1558 / 1570
页数:13
相关论文
共 20 条
[1]   A model for identifying HERG K+ channel blockers [J].
Aronov, AM ;
Goldman, BB .
BIOORGANIC & MEDICINAL CHEMISTRY, 2004, 12 (09) :2307-2315
[2]   Scaffold hopping using clique detection applied to reduced graphs [J].
Barker, EJ ;
Buttar, D ;
Cosgrove, DA ;
Gardiner, EJ ;
Kitts, P ;
Willett, P ;
Gillet, VJ .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (02) :503-511
[3]   Evolving interpretable structure - Activity relationships. 1. Reduced graph queries [J].
Birchall, Kristian ;
Gillet, Valerie J. ;
Harper, Gavin ;
Pickett, Stephen D. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (08) :1543-1557
[4]   Contemporary QSAR classifiers compared [J].
Bruce, Craig L. ;
Melville, James L. ;
Pickett, Stephen D. ;
Hirst, Jonathan D. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (01) :219-227
[5]   Unsupervised data base clustering based on Daylight's fingerprint and Tanimoto similarity: A fast and automated way to cluster small and large data sets [J].
Butina, D .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (04) :747-750
[6]   Multiobjective optimization and multiple constraint handling with evolutionary algorithms - Part I: A unified formulation [J].
Fonseca, CM ;
Fleming, PJ .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 1998, 28 (01) :26-37
[7]   Similarity searching using reduced graphs [J].
Gillet, VJ ;
Willett, P ;
Bradshaw, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02) :338-345
[8]  
Goldberg D.E, 1989, GENETIC ALGORITHMS S
[9]   The reduced graph descriptor in virtual screening and data-driven clustering of high-throughput screening data [J].
Harper, G ;
Bravi, GS ;
Pickett, SD ;
Hussain, J ;
Green, DVS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (06) :2145-2156
[10]  
HARPER G, 2006, DISCOVER, V1, P694