What is a tall poppy among Web pages?

被引:11
作者
Pringle, G [1 ]
Allison, L [1 ]
Dowe, DL [1 ]
机构
[1] Monash Univ, Sch Comp Sci & Software Engn, Clayton, Vic 3168, Australia
来源
COMPUTER NETWORKS AND ISDN SYSTEMS | 1998年 / 30卷 / 1-7期
关键词
search engines; machine learning;
D O I
10.1016/S0169-7552(98)00061-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Search engines and indices were created to help people find information amongst the rapidly increasing number of World Wide Web (WWW) pages. The search engines automatically visit and index pages so that they can return good matches for their users' queries. The way that this indexing is done varies from engine to engine and the detail is usually secret although the strategy is sometimes made public in general terms. The search engines' aim is to return relevant pages quickly. On the other hand, the author of a Web page has a vested interest in it rating highly, for appropriate queries, on as many search engines as possible. Some authors have an interest in their page rating well for a great many types of query indeed - spamming has come to the Web. We treat modelling the workings of WWW search engines as an inductive inference problem. A training set of data is collected, being pages returned in response to typical, queries. Decision trees are used as the model class for the search engines' selection criteria although this is not to say that search engines actually contain decision trees. A machine learning program is used to infer a decision tree for each search engine, an information-theory criterion being used to direct the inference and to prevent over-fitting. (C) 1998 Published by Elsevier Science B.V. All rights reserved.
引用
收藏
页码:369 / 377
页数:9
相关论文
共 4 条
[1]  
BAXTER RA, 1994, P 4 IEEE DAT COMPR C, P498
[2]  
Quinlan J., 1993, C 4 5 PROGRAMS MACHI
[3]  
QUINLAN JR, 1989, INFORM COMPUT, V80, P227, DOI 10.1016/0890-5401(89)90010-2
[4]   CODING DECISION TREES [J].
WALLACE, CS ;
PATRICK, JD .
MACHINE LEARNING, 1993, 11 (01) :7-22