A tool for generating synthetic authorship records for evaluating author name disambiguation methods

被引:15
作者
Ferreira, Anderson A. [1 ,2 ]
Goncalves, Marcos Andre [1 ]
Almeida, Jussara M. [1 ]
Laender, Alberto H. F. [1 ]
Veloso, Adriano [1 ]
机构
[1] Univ Fed Minas Gerais, Dept Ciencia Comp, Belo Horizonte, MG, Brazil
[2] Univ Fed Ouro Preto, Dept Comp, Ouro Preto, MG, Brazil
关键词
Author name disambiguation; Digital library; Bibliographic citation; Synthetic generator;
D O I
10.1016/j.ins.2012.04.022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
The author name disambiguation task has to deal with uncertainties related to the possible many-to-many correspondences between ambiguous names and unique authors. Despite the variety of name disambiguation methods available in the literature to solve the problem, most of them are rarely compared against each other. Moreover, they are often evaluated without considering a time evolving digital library, susceptible to dynamic (and therefore challenging) patterns such as the introduction of new authors and the change of researchers' interests over time. In order to facilitate the evaluation of name disambiguation methods in various realistic scenarios and under controlled conditions, in this article we propose SyGAR, a new Synthetic Generator of Authorship Records that generates citation records based on author profiles. SyGAR can be used to generate successive loads of citation records simulating a living digital library that evolves according to various publication patterns. We validate SyGAR by comparing the results produced by three representative name disambiguation methods on real as well as synthetically generated collections of citation records. We also demonstrate its applicability by evaluating those methods on a time evolving digital library collection generated with the tool, considering several dynamic and realistic scenarios. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:42 / 62
页数:21
相关论文
共 57 条
[1]
[Anonymous], 2007, Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), DOI DOI 10.3115/1621474.1621486
[2]
[Anonymous], P 3 WEB PEOPL SEARCH
[3]
[Anonymous], 2005, WWW '05
[4]
[Anonymous], 2011, Journal of Information and Data Management
[5]
[Anonymous], 2010, P 10 ANN JOINT C DIG, DOI 10.1145/1816123.1816130
[6]
[Anonymous], 2005, P INT C VER LARG DAT
[7]
Artiles J ., 2009, P 2 WEB PEOPL SEARCH
[8]
Bhattacharya I., 2007, ACM T KNOWL DISCOV D, V1, P5
[9]
Bhattacharya I, 2006, SIAM PROC S, P47
[10]
Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022