Surrogate data - a secure way to share corporate data

被引:7
作者
Tetko, IV [1 ]
Abagyan, R
Oprea, TI
机构
[1] Ukrainian Acad Sci, Inst Bioorgan & Petr Chem, Kiev, Ukraine
[2] Scripps Res Inst, La Jolla, CA 92037 USA
[3] Univ New Mexico, Sch Med, Div Biocomp, Albuquerque, NM 87131 USA
[4] GSF Forschungszentrum Umwelt & Gesundheit GMBH, Inst Bioinformat, D-85764 Neuherberg, Germany
关键词
drug design; structure-property prediction; information content of a molecule; representation of molecules; surrogate data; lipophilicity prediction;
D O I
10.1007/s10822-005-9013-3
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The privacy of chemical structure is of paramount importance for the industrial sector, in particular for the pharmaceutical industry. At the same time, companies handle large amounts of physico-chemical and biological data that could be shared in order to improve our molecular understanding of pharmacokinetic and toxicological properties, which could lead to improved predictivity and shorten the development time for drugs, in particular in the early phases of drug discovery. The current study provides some theoretical limits on the information required to produce reverse engineering of molecules from generated descriptors and demonstrates that the information content of molecules can be as low as less than one bit per atom. Thus theoretically just one descriptor can be used to completely disclose the molecular structure. Instead of sharing descriptors, we propose to share surrogate data. The sharing of surrogate data is nothing else but sharing of reliably predicted molecules. The use of surrogate data can provide the same information as the original set. We consider the practical application of this idea to predict lipophilicity of chemical compounds and we demonstrate that surrogate and real (original) data provides similar prediction ability. Thus, our proposed strategy makes it possible not only to share descriptors, but also complete collections of surrogate molecules without the danger of disclosing the underlying molecular structures.
引用
收藏
页码:749 / 764
页数:16
相关论文
共 31 条
[1]  
ABAGYAN R, 2005, 229 AM CHEM SOC NAT
[2]  
BOLOGA C, 2005, 229 AM CHEM SOC NAT
[3]  
CLEMENT OO, 2005, 229 AM CHEM SOC NAT
[4]   The price of innovation: new estimates of drug development costs [J].
DiMasi, JA ;
Hansen, RW ;
Grabowski, HG .
JOURNAL OF HEALTH ECONOMICS, 2003, 22 (02) :151-185
[5]  
FILIMONOV D, 2005, 229 AM CHEM SOC NAT
[6]   ELECTROTOPOLOGICAL STATE INDEXES FOR ATOM TYPES - A NOVEL COMBINATION OF ELECTRONIC, TOPOLOGICAL, AND VALENCE STATE INFORMATION [J].
HALL, LH ;
KIER, LB .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1995, 35 (06) :1039-1045
[7]   ZINC - A free database of commercially available compounds for virtual screening [J].
Irwin, JJ ;
Shoichet, BK .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (01) :177-182
[8]  
KAPPLER MA, 2005, UNPUB J CHEM INF MOD
[9]   AN ELECTROTOPOLOGICAL-STATE INDEX FOR ATOMS IN MOLECULES [J].
KIER, LB ;
HALL, LH .
PHARMACEUTICAL RESEARCH, 1990, 7 (08) :801-807
[10]  
Kier LB, 1999, MOL STRUCTURE DESCRI