Adaptive name matching in information integration

被引:202
作者
Bilenko, M
Mooney, R
Cohen, W
Ravikumar, P
Fienberg, S
机构
[1] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
[2] Carnegie Mellon Univ, Ctr Automated Learning & Discovery, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[4] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USA
[5] Carnegie Mellon Univ, Ctr Comp & Commun Secur, Pittsburgh, PA 15213 USA
基金
美国安德鲁·梅隆基金会; 美国国家科学基金会;
关键词
D O I
10.1109/MIS.2003.1234765
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Different approaches to the name-matching problem in information integration are discussed. Those methods are used that adapt to a specific domain by combining multiple string similarity methods that capture different notions of similarity. Edit distance metrics are widely used and many variations are possible. An adaptive version of edit distance with affine gaps is proposed.
引用
收藏
页码:16 / 23
页数:8
相关论文
共 27 条
[11]  
Galhardas H, 2000, SIGMOD RECORD, V29, P590
[12]  
HERNANDEZ MA, 1995, P 1995 ACM SIGMOD IN, P127
[13]   PROBABILISTIC LINKAGE OF LARGE PUBLIC-HEALTH DATA FILES [J].
JARO, MA .
STATISTICS IN MEDICINE, 1995, 14 (5-7) :491-498
[15]  
JOACHIMS T, 2002, LEARNIGN CLASSIFY TE
[16]  
McCallum A., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P169, DOI 10.1145/347090.347123
[17]  
Mong Li Loo, 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P290
[18]  
MONGE AE, 1996, P 2 INT C KNOWL DISC, P267
[19]   AUTOMATIC LINKAGE OF VITAL RECORDS [J].
NEWCOMBE, HB ;
KENNEDY, JM ;
AXFORD, SJ ;
JAMES, AP .
SCIENCE, 1959, 130 (3381) :954-959
[20]  
PASULA H, 2003, ADV NEURAL INFORMATI, P15