Using Graph Indices for the Analysis and Comparison of Chemical Datasets

被引:23
作者
Fourches, Denis [1 ]
Tropsha, Alexander [1 ]
机构
[1] Univ N Carolina, Lab Mol Modeling, Eshelman Sch Pharm, Chapel Hill, NC 27599 USA
关键词
Chemical dataset graph; Graph indices; QSAR; ADDAGRA; QSAR; DIFFERENTIATION;
D O I
10.1002/minf.201300076
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to charac-terize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis) similarity modeling of multiple datasets studied in chemical genomics applications.
引用
收藏
页码:827 / 842
页数:16
相关论文
共 33 条
[1]  
Austel V., 1983, STERIC EFFECTS DRUG, P21
[2]   Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development [J].
Bandyopadhyay, Deepak ;
Huan, Jun ;
Prins, Jan ;
Snoeyink, Jack ;
Wang, Wei ;
Tropsha, Alexander .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2009, 23 (11) :773-784
[3]  
Bunke H., 2003, APPL PATTERN RECOGNI
[4]   Recent advances in graph-based pattern recognition with applications in document analysis [J].
Bunke, Horst ;
Riesen, Kaspar .
PATTERN RECOGNITION, 2011, 44 (05) :1057-1067
[5]  
Cook D.J., 2007, Mining graph data
[6]   Clustering methods and their uses in computational chemistry [J].
Downs, GM ;
Barnard, JM .
REVIEWS IN COMPUTATIONAL CHEMISTRY, VOL 18, 2002, 18 :1-40
[7]  
Dubitzky W., 2003, PRACTICAL APPROACH M, P1
[8]   Exploring Quantitative Nanostructure-Activity Relationships (QNAR) Modeling as a Tool for Predicting Biological Effects of Manufactured Nanoparticles [J].
Fourches, Denis ;
Pu, Dongqiuye ;
Tropsha, Alexander .
COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2011, 14 (03) :217-225
[9]   Quantitative Nanostructure-Activity Relationship Modeling [J].
Fourches, Denis ;
Pu, Dongqiuye ;
Tassa, Carlos ;
Weissleder, Ralph ;
Shaw, Stanley Y. ;
Mumper, Russell J. ;
Tropsha, Alexander .
ACS NANO, 2010, 4 (10) :5703-5712
[10]   Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research [J].
Fourches, Denis ;
Muratov, Eugene ;
Tropsha, Alexander .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (07) :1189-1204