Construction of a large-scale test set for author disambiguation

被引:31
作者
Kang, In-Su [2 ]
Kim, Pyung [1 ]
Lee, Seungwoo
Jung, Hanmin
You, Beom-Jong
机构
[1] KISTI, Knowledge Informat Ctr, Taejon 305806, South Korea
[2] Kyungsung Univ, Dept Comp Sci & Engn, Pusan 608736, South Korea
关键词
Test set construction; Author disambiguation; Author ambiguity;
D O I
10.1016/j.ipm.2010.10.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Author disambiguation resolves same-name author occurrences in the bibliographic data into namesakes. This enables author-centered searches and high-quality social network analysis. As an attempt to promote much research in author disambiguation. KISTI have constructed a new large-scale test set for this field. This article describes its semi-manual creation procedures, characteristics especially in terms of author ambiguities and name diversities. In addition, the baseline performance of author clustering against the test set is provided. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:452 / 465
页数:14
相关论文
共 16 条
[1]
[Anonymous], 2008, Introduction to information retrieval
[2]
Fatemieh O., 2005, HOME PAGE FINDER
[3]
Name disambiguation spectral in author citations using a K-way clustering method [J].
Han, H ;
Zha, HY ;
Giles, CL .
PROCEEDINGS OF THE 5TH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, PROCEEDINGS, 2005, :334-343
[4]
Huang J, 2006, LECT NOTES ARTIF INT, V4213, P536
[5]
Kanani P., 2007, P 6 INT WORKSH INF I
[6]
Kang In-Su, 2009, [The Journal of the Korea Contents Association, 한국콘텐츠학회 논문지], V9, P455
[7]
Lee D., 2005, IQIS 05, P69
[8]
Ley M., 2009, P INT C VER LARG DAT
[9]
McCallum A., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P169, DOI 10.1145/347090.347123
[10]
McRae-Spencer DM, 2006, OPENING INFORMATION HORIZONS, P53