Yahoo! as an ontology - Using Yahoo! categories to describe documents

被引:83
作者
Labrou, Y [1 ]
Finin, T [1 ]
机构
[1] Univ Maryland, Dept Comp Sci & Elect Engn, Baltimore, MD 21250 USA
来源
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99 | 1999年
关键词
D O I
10.1145/319950.319976
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We suggest that one (or a collection) of names of Yahoo! (or any other WWW indexer's) categories can be used to describe the content of a document. Such categories offer a standardized and universal way for referring to or describing the nature of real world objects, activities, documents and so on, and may be used (we suggest) to semantically characterize the content of documents. WWW indices, like Yahoo! provide a huge hierarchy of categories (topics) that touch every aspect of human endeavors. Such topics can be used as descriptors, similarly to the way librarians use for example, the Library of Congress cataloging system to annotate and categorize books. In the course of investigating this idea, we address the problem of automatic categorization of webpages in the Yahoo directory. We use Telltale as our classifier; Telltale uses n-grams to compute the similarity between documents. We experiment with various types of descriptions for the Yahoo! categories and the webpages to be categorized. Our findings suggest that the best results occur when using the very brief descriptions of the Yahoo! categorized entries; these brief descriptions are provided either by the entries' submitters or by the Yahoo human indexers and accompany most Yahoo!-indexed entries.
引用
收藏
页码:180 / 187
页数:8
相关论文
共 14 条
[1]  
[Anonymous], P ICML 97
[2]  
CROWDER G, 1996, 1 IEEE MET C
[3]  
CROWDER G, 1995, CIKM95 WORKSH INT IN
[4]  
CROWDER G, 1996, SIGIR 96 WORKSH NETW
[5]  
CROWDER G, 1997, SIGIR97 WORKSH NETW
[6]  
GROBELNIK M, 1998, ECML 98 WORKSH TEXT
[7]  
MAYFIELD J, 1997, TREC6 C NOT PAP
[8]  
Mladenic D, 1998, ECAI 1998: 13TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, P473
[9]  
Mladenic D., 1998, P 10 EUR C MACH LEAR
[10]  
Mladenic D., 1998, THESIS U LJUBLJANA S