Finding authoritative people from the web

被引:8
作者
Harada, M [1 ]
Sato, SY [1 ]
Kazama, K [1 ]
机构
[1] Network Innovat Labs, Nippon Telegraph & Telephone Corp, Musashino, Tokyo 1808585, Japan
来源
JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT | 2004年
关键词
Web mining; text analysis; proper name extraction; question answering;
D O I
10.1145/996350.996420
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today's web is so huge and diverse that it arguably reflects the real world. For this reason, searching the web is a promising approach to find things in the real world. This paper presents NEXAS, an extension to web search engines that attempts to find real-world entities relevant to a topic. Its basic idea is to extract proper names from the web pages retrieved for the topic. A main advantage of this approach is that users can query any topic and learn about relevant real-world entities without dedicated databases for the topic. In particular, we focus on an application for finding authoritative people from the web. This application is practically important because once personal names are obtained, they can lead users from the web to managed information stored in digital libraries. To explore effective ways of finding people, we first examine the distribution of Japanese personal names by analyzing about 50 million Japanese web pages. We observe that personal names appear frequently on the web, but the distribution is highly influenced by automatically generated texts. To remedy the bias and find widely acknowledged people accurately, we utilize the number of web servers containing a name instead of the number of web pages. We show its effectiveness by an experiment covering a wide range of topics. Finally, we demonstrate several examples and suggest possible applications.
引用
收藏
页码:306 / 313
页数:8
相关论文
共 20 条
[1]  
[Anonymous], 1996, P 19 ANN INT ACM SIG, DOI DOI 10.1145/243199.243202
[2]  
BALDI P, 2003, MODELING INTERNET WE, P22
[3]  
BRILL E, 2001, P 10 TEXT RETR C NOV
[4]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[5]  
CHAKRABARTI S, 2003, MINING WEB DISCOVERI, pCH7
[6]  
Chinchor N., 1998, P MUC 7
[7]  
FRAKES WB, 1992, INFORMATION RETRIEVA, P113
[8]  
FRENCH JC, 2002, P 2 ACM IEEE CS JOIN, P320
[9]  
HARADA M, P DEWS2003 MARCH 200
[10]  
Kautz H, 1997, AI MAG, V18, P27