Investigation of classification methods for the prediction of activity in diverse chemical libraries

被引:34
作者
Dixon, SL [1 ]
Villar, HO [1 ]
机构
[1] Telik Inc, S San Francisco, CA 94080 USA
关键词
chemoinformatics; classification; cluster analysis; discriminant analysis; recursive partitioning; topological descriptors; 2D structural keys;
D O I
10.1023/A:1008061017938
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Classification methods based on linear discriminant analysis, recursive partitioning, and hierarchical agglomerative clustering are examined for their ability to separate active and inactive compounds in a diverse chemical database. Topology-based descriptions of chemical structure from the Molconn-X and ISIS programs are used in conjunction with these classification techniques to identify ACE inhibitors, beta-adrenergic antagonists, and H-2 receptor antagonists. Overall, discriminant analysis misclassifies the smallest number of active compounds, while recursive partitioning yields the lowest rate of misclassification among inactives. Binary structural keys from the ISIS package are found to generally outperform the whole-molecule Molconn-X descriptors, especially for identification of inactive compounds. For all targets and classification methods, sensitivity toward active compounds is increased by making repetitive classifications using training sets that contain equal numbers of actives and inactives. These balanced training sets provide an average numerical class membership score which may be used to select subsets of compounds that are enriched in actives.
引用
收藏
页码:533 / 545
页数:13
相关论文
共 33 条
  • [1] MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING
    BANFIELD, JD
    RAFTERY, AE
    [J]. BIOMETRICS, 1993, 49 (03) : 803 - 821
  • [2] CLUSTERING OF CHEMICAL STRUCTURES ON THE BASIS OF 2-DIMENSIONAL SIMILARITY MEASURES
    BARNARD, JM
    DOWNS, GM
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (06): : 644 - 649
  • [3] Bravi G, 1997, J COMPUT CHEM, V18, P1295, DOI 10.1002/(SICI)1096-987X(19970730)18:10<1295::AID-JCC4>3.0.CO
  • [4] 2-I
  • [5] Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection
    Brown, RD
    Martin, YC
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03): : 572 - 584
  • [6] The measurement of molecular diversity: A three-dimensional approach
    Chapman, D
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1996, 10 (06) : 501 - 512
  • [7] Dillon W.R., 1984, MULTIVARIATE ANAL ME
  • [8] Bioactive diversity and screening library selection via affinity fingerprinting
    Dixon, SL
    Villar, HO
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (06): : 1192 - 1203
  • [9] Ferguson A. M., 1996, J BIOMOL SCREEN, V1, P65
  • [10] Friedman JH., 1984, BIOMETRICS, V40, P874, DOI [DOI 10.2307/2530946, 10.2307/2530946]