Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion

被引:938
作者
Dong, Xin Luna [1 ]
Gabrilovich, Evgeniy [1 ]
Heitz, Geremy [1 ]
Horn, Wilko [1 ]
Lao, Ni [1 ]
Murphy, Kevin [1 ]
Strohmann, Thomas [1 ]
Sun, Shaohua [1 ]
Zhang, Wei [1 ]
机构
[1] Google, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 USA
来源
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14) | 2014年
关键词
Knowledge bases; information extraction; probabilistic models; machine learning; BASE;
D O I
10.1145/2623330.2623623
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Microsoft's Satori, and Google's Knowledge Graph. To increase the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous approaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories. We employ supervised machine learning methods for fusing these distinct information sources. The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilistic inference system that computes calibrated probabilities of fact correctness. We report the results of multiple studies that explore the relative utility of the different information sources and extraction methods.
引用
收藏
页码:601 / 610
页数:10
相关论文
共 47 条
  • [1] AKBC-WEKEX, 2012, KNOWL EXTR WORKS NAA KNOWL EXTR WORKS NAA
  • [2] Angeli G., 2013, CONLL
  • [3] [Anonymous], 2012, EMNLP
  • [4] [Anonymous], 2011, P 4 ACM INT C WEB SE, DOI DOI 10.1145/1935826.1935869
  • [5] [Anonymous], 2013, P 26 INT C NEURAL IN
  • [6] [Anonymous], 2013, KNOWL INF SYST
  • [7] [Anonymous], 2010, AAAI
  • [8] [Anonymous], 2011, ser. Synthesis Lectures on Data Management
  • [9] [Anonymous], 2013, SIGMOD
  • [10] [Anonymous], 2013, P WORKSHOP ICLR 2013