SVM-based interactive document retrieval with active learning

被引:2
作者
Onoda, Takashi [1 ]
Murata, Hiroshi [1 ]
Yamada, Seiji [2 ]
机构
[1] Cent Res Inst Elect Power Ind, Tokyo 2018511, Japan
[2] SOKENDAI, Nat Inst Informat, Tokyo 1018430, Japan
关键词
document retrieval; relevance feedback; support vector machines; active learning;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes an application of SVM (Support Vector Machines) to interactive document retrieval using active learning. Some works have been done to apply classification learning like SVM to relevance feedback and have obtained successful results. However they did not fully utilize characteristic of example distribution in document retrieval. We propose heuristics to bias document showing for user's judgement according to distribution of examples in document retrieval. This heuristics is executed by selecting examples to show a user in neighbors of positive support vectors, and it improves learning efficiency. We implemented a SVM-based interactive document retrieval system using our proposed heuristics, and compared it with conventional systems like Rocchio-based system and a SVM-based system without the heuristics. We conducted systematic experiments using large data sets including over 500,000 newspaper articles and confirmed our system outperformed other ones.
引用
收藏
页码:49 / 61
页数:13
相关论文
共 20 条
  • [1] [Anonymous], 2001, P 18 INT C MACH LEAR
  • [2] [Anonymous], 1999, MSRTR9987
  • [3] [Anonymous], 1999, Modern Information Retrieval
  • [4] CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
  • [5] Support vector machines for spam categorization
    Drucker, H
    Wu, DH
    Vapnik, VN
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05): : 1048 - 1054
  • [6] Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651
  • [7] Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
  • [8] Lemmen, 2002, ADV NEURAL INFORM PR, V14
  • [9] Melville P., 2004, Twenty-first international conference on Machine learning-ICML'04, P74, DOI DOI 10.1145/1015330.1015385
  • [10] Murata H., 2006, P 2006 IEEE WORLD C, P2191