Semantic Labeling of Online information sources

被引:11
作者
Lerman, Kristina [1 ]
Plangprasopchok, Anon [1 ]
Knoblock, Craig A. [1 ]
机构
[1] Univ So Calif, Inst Sci Informat, Los Angeles, CA 90089 USA
关键词
data integration; data modeling; data semantics; semantic data model;
D O I
10.4018/jswis.2007070102
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
In order to combine data from various heterogeneous sources, software agents must first understand the semantics of the sources, expressed in the source model. Currently, source modeling is manual, but as large numbers of sources come online, it is impractical to expect users to continue modeling them by hand. We describe two machine learning techniques for automatically modeling information sources: one that uses source's metadata, contained it? a Web Set-vice Definition file, and one that uses the sources content, to classify the semantics of the data it uses. We go beyond previous works and verify predictions by invoking the source with sample data of the predicted type. Hie provide performance results of both methods and validate our approach on several live Web sources. In addition, we describe the application of semantic modeling within the CALO project.
引用
收藏
页码:36 / 56
页数:21
相关论文
共 24 条
[1]
[Anonymous], 2005, MACHINE LEARNING, Patent No. 589310
[2]
BAKER LD, 1998, P ACM SIG INF RETR S
[3]
BLYTHE J, 2007, BUILDING INFORM INTE
[4]
CARMAN M, 2007, P INT JOINT C ART IN
[5]
CHAKRABARTI S, 1998, P SIGMOD 98
[6]
Learning to match the schemas of data sources: A multistrategy approach [J].
Doan, A ;
Domingos, P ;
Halevy, A .
MACHINE LEARNING, 2003, 50 (03) :279-301
[7]
Doan AnHai., 2001, ACM Sigmod Record, V30, P509, DOI DOI 10.1145/375663.375731
[8]
DONG X, 2004, P INT C VER LARG DAT
[9]
HESS A, 2003, P 2 INT SEM WEB C
[10]
Hess Andreas, 2004, P 3 INT SEM WEB C IS