复杂数据上的实体识别技术研究

被引：19

作者：

王宏志 ^{[1
]}

樊文飞 ^{[1
,2
]}

机构：

[1] 哈尔滨工业大学计算机科学与技术学院

[2] 爱丁堡大学信息学院

来源：

计算机学报 | 2011年 / 34卷 / 10期

关键词：

数据质量; 复杂数据; 实体识别; XML图; 复杂网络;

D O I：

暂无

中图分类号：

TP311.13 [];

学科分类号：

1201 ;

摘要：

复杂数据当前有着广泛的应用.有效地使用复杂数据需要对其质量进行管理.实体识别是数据质量管理的基本操作,用于在数据集合中发现同一实体的不同描述,其在数据质量管理中可以用于错误检测、不一致数据发现等.由于包含复杂的结构信息,复杂数据上的实体识别与传统文本和关系数据上的实体识别不同,带来了新的技术上的挑战.该文介绍了复杂数据上实体识别的概念和应用,分别讨论了XML数据、图数据和复杂网络上实体识别技术的原理,最后展望了未来的研究方向.

引用

页码：1843 / 1852

页数：10

共 8 条

[1] On graph-based name disambiguation [J].

Fan X. ;

Wang J. ;

Pu X. ;

Zhou L. ;

Lv B. .

Journal of Data and Information Quality, 2011, 2 (02)

[2]

Mirror, mirror on the Web: a study of host pairs with replicated content[J] . Krishna Bharat,Andrei Broder.Computer Networks . 1999 (11)

[3] The impact of poor data quality on the typical enterprise [J].

Redman, TC .

COMMUNICATIONS OF THE ACM, 1998, 41 (02) :79-82

[4]

The Tree-to-Tree Correction Problem[J] . Kuo-Chung Tai.Journal of the ACM （JACM） . 1979 (3)

[5]

Data quality and the bottom line:Achieving business success through a commitment to high quality data. W.W.Eckerson. . 2002

[6]

Scaling link-based similarity search. Fogaras D,R cz B. Proceedings of the 14th International Conference on World Wide Web . 2005

[7]

Comparingstars:on approximating graph edit distance. Zeng Z,Tung A K H,Wang J,Feng J,Zhou L. PVLDB . 2009

[8]

Substructure similarity search ingraph databases. Yan X,Yu P S,Han J. Proceedings of the ACM SIGMOD Interna-tional Conference on Management of Data . 2005

← 1 →