Extracting Named Entities and Relating Them over Time Based on Wikipedia

被引:1
作者
Bhole, Abhijit [1 ]
Fortuna, Blaz [2 ]
Grobelnik, Marko [2 ]
Mladenic, Dunja [2 ]
机构
[1] Indian Inst Technol, Bombay 400076, Maharashtra, India
[2] Jozef Stefan Inst, Ljubljana 1000, Slovenia
来源
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS | 2007年 / 31卷 / 04期
关键词
text mining; document categorization; information extraction;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper presents an approach to mining information relating people, places, organizations and events extracted from Wikipedia and linking them on a time scale. The approach consists of two phases: (1) identifying relevant pages - categorizing the articles as containing people, places or organizations; (2) generating timeline - linking named entities and extracting events and their time frame. We illustrate the proposed approach on 1.7 million Wikipedia articles.
引用
收藏
页码:463 / 468
页数:6
相关论文
共 9 条
[1]  
Bhole A., 2007, P 10 INT MULTI INF S, P177
[2]  
Grobelnik M., 2007, TEXT MINING RECIPES
[3]  
Joachims T., 1999, ADV KERNEL METHODS S
[4]  
Lenat Douglas B., 1995, 38 ACM, V38
[5]   Feature selection on hierarchy of web documents [J].
Mladenic, D ;
Grobelnik, M .
DECISION SUPPORT SYSTEMS, 2003, 35 (01) :45-87
[6]   ScentTrails: Integrating browsing and searching on the Web [J].
Olston, Christopher ;
Chi, Ed H. .
ACM Transactions on Computer-Human Interaction, 2003, 10 (03) :177-197
[7]   Machine learning in automated text categorization [J].
Sebastiani, F .
ACM COMPUTING SURVEYS, 2002, 34 (01) :1-47
[8]  
Shah P., 2006, P 19 INT FLAIRS C, P153
[9]  
Zaragoza H., 2007, SEMANTICALLY ANNOTAT