Pharmacophore Alignment Search Tool: Influence of Canonical Atom Labeling on Similarity Searching

被引:10
作者
Haehnke, Volker [1 ,2 ]
Rupp, Matthias [3 ]
Krier, Mireille [4 ]
Rippmann, Friedrich [4 ]
Schneider, Gisbert [1 ]
机构
[1] ETH, Inst Pharmaceut Sci, CH-8093 Zurich, Switzerland
[2] Goethe Univ Frankfurt, Chair Chem & Bioinformat, D-60323 Frankfurt, Germany
[3] German Res Ctr Environm Hlth, Helmholtz Zentrum Munchen, D-85764 Neuherberg, Germany
[4] Merck KGaA, Merck Serono Res Bio & Chemoinformat, D-64293 Darmstadt, Germany
关键词
global alignment; line notation; molecular graph; similarity; virtual screening; CHEMICAL-STRUCTURES; SEQUENCE; ALGORITHM; GENERATION; LIBRARY;
D O I
10.1002/jcc.21574
中图分类号
O6 [化学];
学科分类号
070301 [无机化学];
摘要
Previously, (Hahnke et al., J Comput Chem 2009, 30, 761) we presented the Pharmacophore Alignment Search Tool (PhAST), a ligand-based virtual screening technique representing molecules as strings coding pharmacophoric features and comparing them by global pairwise sequence alignment. To guarantee unambiguity during the reduction of two-dimensional molecular graphs to one-dimensional strings, PhAST employs a graph canonization step. Here, we present the results of the comparison of 11 different algorithms for graph canonization with respect to their impact on virtual screening. Retrospective screenings of a drug-like data set were evaluated using the BED-ROC metric, which yielded averaged values between 0.4 and 0.14 for the best-performing and worst-performing canonization technique. We compared five scoring schemes for the alignments and found preferred combinations of canonization algorithms and scoring functions. Finally, we introduce a performance index that helps prioritize canonization approaches without the need for extensive retrospective evaluation. (C) 2010 Wiley Periodicals, Inc. J Comput Chem 31: 2810-2826, 2010
引用
收藏
页码:2810 / 2826
页数:17
相关论文
共 42 条
[1]
[Anonymous], 1896, Philosophical Transactions of the Royal Society of London Series A, containing papers of a mathematical or physical character, DOI [10.1098/rsta.1896.0007, DOI 10.1098/RSTA.1896.0007]
[2]
Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[3]
An efficient Z-score algorithm for assessing sequence alignments [J].
Booth, HS ;
Maindonald, JH ;
Wilson, SR ;
Gready, JE .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2004, 11 (04) :616-625
[4]
CSDP, a C library for semidefinite programming [J].
Borchers, B .
OPTIMIZATION METHODS & SOFTWARE, 1999, 11-2 (1-4) :613-623
[5]
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[6]
A practical approach to significance assessment in alignment with gaps [J].
Chia, N ;
Bundschuh, R .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (02) :429-441
[7]
NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[8]
A TECHNIQUE FOR COMPUTER DETECTION AND CORRECTION OF SPELLING ERRORS [J].
DAMERAU, FJ .
COMMUNICATIONS OF THE ACM, 1964, 7 (03) :171-176
[9]
Dayhoff M. O., 1978, ATLAS PROTEIN SEQ ST
[10]
Doolin D. M., 1999, Scientific Programming, V7, P111