gBoost: a mathematical programming approach to graph classification and regression

被引:104
作者
Saigo, Hiroto [1 ]
Nowozin, Sebastian [1 ]
Kadowaki, Tadashi [2 ]
Kudo, Taku [3 ]
Tsuda, Koji [1 ]
机构
[1] Max Planck Inst Biol Cybernet, D-72076 Tubingen, Germany
[2] Kyoto Univ, Inst Chem Res, Bioinformat Ctr, Uji, Kyoto 6110011, Japan
[3] Google Japan Inc, Shibuya Ku, Tokyo 1508512, Japan
关键词
Graph mining; Mathematical programming; Classification; Regression; QSAR; LARGE DIVERSE SET; SELECTION; KERNELS;
D O I
10.1007/s10994-008-5089-z
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.
引用
收藏
页码:69 / 89
页数:21
相关论文
共 45 条
[1]
[Anonymous], 2000, Data on the Web-From Relations to Semistructured Data and XML
[2]
[Anonymous], 2004, P 10 ACM SIGKDD INT
[3]
[Anonymous], P ICML
[4]
Boyd S.P., 2004, Berichte UberVerteilte Messysteme
[5]
Bringmann B, 2006, LECT NOTES ARTIF INT, V4213, P55
[6]
Cai Lijuan, 2004, PROC 13 ACM INTERNAT, P78
[7]
Cohen W. W., 1995, P 12 INT C MACH LEAR, P115, DOI DOI 10.1016/B978-1-55860-377-6.50023-2
[8]
Linear programming boosting via column generation [J].
Demiriz, A ;
Bennett, KP ;
Shawe-Taylor, J .
MACHINE LEARNING, 2002, 46 (1-3) :225-254
[9]
Stabilized column generation [J].
du Merle, O ;
Villeneuve, D ;
Desrosiers, J ;
Hansen, P .
DISCRETE MATHEMATICS, 1999, 194 (1-3) :229-237
[10]
Reoptimization of MDL keys for use in drug discovery [J].
Durant, JL ;
Leland, BA ;
Henry, DR ;
Nourse, JG .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (06) :1273-1280