A novel hybrid ultrafast shape descriptor method for use in virtual screening

被引:34
作者
Cannon, Edward O. [1 ]
Nigsch, Florian [1 ]
Mitchell, John B. O. [1 ]
机构
[1] Univ Cambridge, Dept Chem, Unilever Ctr Mol Sci Informat, Cambridge CB2 1EW, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1186/1752-153X-2-3
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: We have introduced a new Hybrid descriptor composed of the MACCS key descriptor encoding topological information and Ballester and Richards' Ultrafast Shape Recognition (USR) descriptor. The latter one is calculated from the moments of the distribution of the interatomic distances, and in this work we also included higher moments than in the original implementation. Results: The performance of this Hybrid descriptor is assessed using Random Forest and a dataset of 116,476 molecules. Our dataset includes 5,245 molecules in ten classes from the 2005 World Anti-Doping Agency (WADA) dataset and 111,231 molecules from the National Cancer Institute (NCI) database. In a 10-fold Monte Carlo cross-validation this dataset was partitioned into three distinct parts for training, optimisation of an internal threshold that we introduced, and validation of the resulting model. The standard errors obtained were used to assess statistical significance of observed improvements in performance of our new descriptor. Conclusion: The Hybrid descriptor was compared to the MACCS key descriptor, USR with the first three (USR), four (UF4) and five (UF5) moments, and a combination of MACCS with USR (three moments). The MACCS key descriptor was not combined with UF5, due to similar performance of UF5 and UF4. Superior performance in terms of all figures of merit was found for the MACCS/UF4 Hybrid descriptor with respect to all other descriptors examined. These figures of merit include recall in the top 1% and top 5% of the ranked validation sets, precision, F-measure, area under the Receiver Operating Characteristic curve and Matthews Correlation Coefficient.
引用
收藏
页数:9
相关论文
共 34 条
  • [1] The use of consensus scoring in ligand-based virtual screening
    Baber, JC
    William, AS
    Gao, YH
    Feher, M
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) : 277 - 288
  • [2] Assessing the accuracy of prediction algorithms for classification: an overview
    Baldi, P
    Brunak, S
    Chauvin, Y
    Andersen, CAF
    Nielsen, H
    [J]. BIOINFORMATICS, 2000, 16 (05) : 412 - 424
  • [3] Ultrafast shape recognition for similarity search in molecular databases
    Ballester, Pedro J.
    Richards, W. Graham
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2007, 463 (2081): : 1307 - 1321
  • [4] 2D QSAR consensus prediction for high-throughput virtual screening. An application to COX-2 inhibition modeling and screening of the NCI database
    Baurin, N
    Mozziconacci, JC
    Arnoult, E
    Chavatte, P
    Marot, C
    Morin-Allory, L
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01): : 276 - 285
  • [5] Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance
    Bender, A
    Mussa, HY
    Glen, RC
    Reiling, S
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05): : 1708 - 1718
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds
    Cannon, Edward O.
    Amini, Ata
    Bender, Andreas
    Sternberg, Michael J. E.
    Muggleton, Stephen H.
    Glen, Robert C.
    Mitchell, John B. O.
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2007, 21 (05) : 269 - 280
  • [8] Cannon EO, 2006, LECT NOTES COMPUT SC, V4216, P173
  • [9] Chemoinformatics-based classification of prohibited substances employed for doping in sport
    Cannon, Edward O.
    Bender, Andreas
    Palmer, David S.
    Mitchell, John B. O.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (06) : 2369 - 2380
  • [10] SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA
    DOWNS, GM
    WILLETT, P
    FISANICK, W
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (05): : 1094 - 1102