Multi-instance learning based web mining

被引:113
作者
Zhou, ZH [1 ]
Jiang, K [1 ]
Li, M [1 ]
机构
[1] Nanjing Univ, Natl Lab Novel Software Technol, Nanjing 210093, Peoples R China
基金
中国国家自然科学基金;
关键词
machine learning; data mining; multi-instance learning; web mining; web index recommendation; text categorization;
D O I
10.1007/s10489-005-5602-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances, and the task is to predict the labels of unseen bags. In this paper, a web mining problem, i.e. web index recommendation, is investigated from a multi-instance view. In detail, each web index page is regarded as a bag while each of its linked pages is regarded as an instance. A user favoring an index page means that he or she is interested in at least one page linked by the index. Based on the browsing history of the user, recommendation could be provided for unseen index pages. An algorithm named Fretcit-kNN, which employs the Minimal Hausdorff distance between frequent term sets and utilizes both the references and citers of an unseen bag in determining its label, is proposed to solve the problem. Experiments show that in average the recommendation accuracy of Fretcit-kNN is 81.0% with 71.7% recall and 70.9% precision, which is significantly better than the best algorithm that does not consider the specific characteristics of multi-instance learning, whose performance is 76.3% accuracy with 63.4% recall and 66.1% precision.
引用
收藏
页码:135 / 147
页数:13
相关论文
共 26 条
  • [1] Aha DW, 1997, ARTIF INTELL REV, V11, P7, DOI 10.1023/A:1006538427943
  • [2] AMAR RA, 2001, P 18 INT C MACH LEAR, P3
  • [3] [Anonymous], THESIS MIT
  • [4] [Anonymous], P 13 INT C MACH LEAR
  • [5] Approximating hyper-rectangles: Learning and pseudorandom sets
    Auer, P
    Long, PM
    Srinivasan, A
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1998, 57 (03) : 376 - 388
  • [6] Auer P., 1997, P 14 INT C MACHINE L, P21
  • [7] A note on learning from multiple-instance examples
    Blum, A
    Kalai, A
    [J]. MACHINE LEARNING, 1998, 30 (01) : 23 - 29
  • [8] Chevaleyre Y., 2001, APPL MUT PROBL CAN C, P204
  • [9] Dasarathy B. V., 1991, IEEE COMPUT SOC TUTO
  • [10] DERAEDT L, 1998, LECT NOTES ARTIF INT, V1446, P1