Cheminformatics analysis and learning in a data pipelining environment

被引:157
作者
Hassan, Moises
Brown, Robert D.
Varma-O'Brien, Shikha
Rogers, David
机构
[1] SciTegic Inc, San Diego, CA 92121 USA
[2] Accelrys Inc, San Diego, CA 92121 USA
关键词
Bayesian models; bioactivity prediction; data mining; data pipelining; maximal common substructure search; molecular fingerprints; molecular similarity; virtual screening;
D O I
10.1007/s11030-006-9041-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Workflow technology is being increasingly applied in discovery information to organize and analyze data. SciTegic's Pipeline Pilot is a chemically intelligent implementation of a workflow technology known as data pipelining. It allows scientists to construct and execute workflows using components that encapsulate many cheminformatics based algorithms. In this paper we review SciTegic's methodology for molecular fingerprints, molecular similarity, molecular clustering, maximal common subgraph search and Bayesian learning. Case studies are described showing the application of these methods to the analysis of discovery data such as chemical series and high throughput screening results. The paper demonstrates that the methods are well suited to a wide variety of tasks such as building and applying predictive models of screening data, identifying molecules for lead optimization and the organization of molecules into families with structural commonality.
引用
收藏
页码:283 / 299
页数:17
相关论文
共 47 条
[1]  
AVIDON VV, 1978, KHIM FARM ZH+, V12, P88
[2]   The properties of known drugs .1. Molecular frameworks [J].
Bemis, GW ;
Murcko, MA .
JOURNAL OF MEDICINAL CHEMISTRY, 1996, 39 (15) :2887-2893
[3]   Molecular similarity: a key technique in molecular informatics [J].
Bender, A ;
Glen, RC .
ORGANIC & BIOMOLECULAR CHEMISTRY, 2004, 2 (22) :3204-3218
[4]   Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01) :170-178
[5]   Informative library design as an efficient strategy to identify and optimize leads: Application to cyclin-dependent kinase 2 antagonists [J].
Bradley, EK ;
Miller, JL ;
Saiah, E ;
Grootenhuis, PDJ .
JOURNAL OF MEDICINAL CHEMISTRY, 2003, 46 (20) :4360-4364
[6]  
Breiman L., 1998, CLASSIFICATION REGRE
[7]  
BREMSER W, 1978, ANAL CHIM ACTA-COMP, V2, P355
[8]   Cell cycle molecular targets in novel anticancer drug discovery [J].
Buolamwini, JK .
CURRENT PHARMACEUTICAL DESIGN, 2000, 6 (04) :379-392
[9]  
DUBOIS JE, 1976, CHEM APPL GRAPH THEO, P161
[10]  
EVERITT, 1997, CLUSTER ANAL