An efficient projection protocol for chemical databases: Singular value decomposition combined with truncated-Newton minimization

被引:22
作者
Xie, DX
Tropsha, A
Schlick, T
机构
[1] NYU, Courant Inst Math Sci, Dept Chem, New York, NY 10012 USA
[2] NYU, Courant Inst Math Sci, Dept Math, New York, NY 10012 USA
[3] Howard Hughes Med Inst, New York, NY 10012 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2000年 / 40卷 / 01期
关键词
D O I
10.1021/ci990333j
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) is presented as a first step in database analysis and design applications. The projection mapping of the compound database (described as vectors in the high-dimensional space of chemical descriptors) is based on the singular value decomposition (SVD) combined with a minimization procedure implemented with the efficient truncated-Newton program package (TNPACK). Numerical experiments on four chemical datasets with real-valued descriptors (ranging from 58 to 27 255 compounds) show that the SVD/TNPACK projection duo achieves a reasonable accuracy in 2D, varying from 30% to about 100% of pairwise distance segments that lie within 10% of the original distances. The lowest percentages, corresponding to scaled datasets, can be made close to 100% with projections onto a 10-dimensional space. We also show that the SVD/TNPACK duo is efficient for minimizing the distance error objective function (especially for scaled datasets), and that TNPACK is much more efficient than a current popular approach of steepest descent minimization in this application context. Applications of our projection technique to similarity and diversity sampling in drug design can be envisioned.
引用
收藏
页码:167 / 177
页数:11
相关论文
共 23 条
[1]  
Agrafiotis DK, 1997, PROTEIN SCI, V6, P287
[2]  
[Anonymous], 1997, SPRINGER SERIES INFO
[3]   THE EUCLIDEAN DISTANCE MATRIX COMPLETION PROBLEM [J].
BAKONYI, M ;
JOHNSON, CR .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1995, 16 (02) :646-654
[4]  
BOYD DB, 1998, MOD DRUG DISCOV, V1, P41
[5]   The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :1-9
[6]  
Crippen G. M., 1988, DISTANCE GEOMETRY MO
[7]   AN ALTERNATING PROJECTION ALGORITHM FOR COMPUTING THE NEAREST EUCLIDEAN DISTANCE MATRIX [J].
GLUNT, W ;
HAYDEN, TL ;
HONG, S ;
WELLS, J .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1990, 11 (04) :589-600
[8]  
Golub G.H., 2013, MATRIX COMPUTATIONS
[9]   PROPERTIES OF EUCLIDEAN AND NON-EUCLIDEAN DISTANCE MATRICES [J].
GOWER, JC .
LINEAR ALGEBRA AND ITS APPLICATIONS, 1985, 67 (JUN) :81-97
[10]  
*HALL ASS CONS, 1998, MOLC VERS 3 1