A scalable topic-based open souirce search engine

被引:12
作者
Buntine, W [1 ]
Löfström, J [1 ]
Perkiö, J [1 ]
Perttu, S [1 ]
Poroshin, V [1 ]
Silander, T [1 ]
Tirri, H [1 ]
Tuominen, A [1 ]
Tuulos, V [1 ]
机构
[1] Helsinki Inst Informat Technol, Complex Syst Computat Grp, FIN-02015 Helsinki, Finland
来源
IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS | 2004年
关键词
D O I
10.1109/WI.2004.10094
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Site-based or topic-specific search engines work with mixed success because of the general difficulty of the information retrieval task, and the lack of good link information to allow authorities to be identified. We are advocating an open source approach to the problem due to its scope and need for software components. We have adopted a topic-based search engine because it represents the next generation of capability. This paper outlines our scalable system for site-based or topic-specific search, and demonstrates the developing system on a small 250,000 document collection of EU and UN web pages.
引用
收藏
页码:228 / 234
页数:7
相关论文
共 18 条
[1]
BAEZAYATES RA, 1999, MODERN INFORMATION R
[2]
Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[3]
Partitioning-based clustering for Web document categorization [J].
Boley, D ;
Gini, M ;
Gross, R ;
Han, EH ;
Hastings, K ;
Karypis, G ;
Kumar, V ;
Mobasher, B ;
Moore, J .
DECISION SUPPORT SYSTEMS, 1999, 27 (03) :329-341
[4]
BUNTINE W, 2002, ECML 2002
[5]
Buntine W. L., 2004, APPL DISCRETE PCA DA
[6]
CHAKRABARTI S, 1999, 8 WORLD WID WEB TOR
[7]
CRASWELL N, 2003, P TREC 2003
[8]
CUTTING DR, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P318
[9]
GRIFFITHS T, 2004, PNAS C
[10]
HAVELIWALA T, 2002, 11 WORLD WID WEB