Bootstrapping Distributional Feature Vector Quality

被引:25
作者
Zhitomirsky-Geffet, Maayan [1 ]
Dagan, Ido [2 ]
机构
[1] Bar Ilan Univ, Dept Informat Sci, Ramat Gan, Israel
[2] Bar Ilan Univ, Dept Comp Sci, Ramat Gan, Israel
关键词
SIMILARITY;
D O I
10.1162/coli.08-032-R1-06-96
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article presents a novel bootstrapping approach for improving the quality of feature vector weighting in distributional word similarity. The method was motivated by attempts to utilize distributional similarity for identifying the concrete semantic relationship of lexical entailment. Our analysis revealed that a major reason for the rather loose semantic similarity obtained by distributional similarity methods is insufficient quality of the word feature vectors, caused by deficient feature weighting. This observation led to the definition of a bootstrapping scheme which yields improved feature weights, and hence higher quality feature vectors. The underlying idea of our approach is that features which are common to similar words are also most characteristic for their meanings, and thus should be promoted. This idea is realized via a bootstrapping step applied to an initial standard approximation of the similarity space. The superior performance of the bootstrapping method was assessed in two different experiments, one based on direct human gold-standard annotation and the other based on an automatically created disambiguation dataset. These results are further supported by applying a novel quantitative measurement of the quality of feature weighting functions. Improved feature weighting also allows massive feature reduction, which indicates that the most characteristic features for a word are indeed concentrated at the top ranks of its vector. Finally, experiments with three prominent similarity measures and two feature weighting functions showed that the bootstrapping scheme is robust and is independent of the original functions over which it is applied.
引用
收藏
页码:435 / 461
页数:27
相关论文
共 54 条
  • [1] ADAMS R, 2006, P 2 PASCAL CHALL WOR, P68
  • [2] [Anonymous], P KONVENS 2004 VIENN
  • [3] [Anonymous], 1993, 31 ANN M ASS COMPUTA, DOI [10.3115/981574.981598, DOI 10.3115/981574.981598]
  • [4] [Anonymous], P 33 ANN M CAMB MA A
  • [5] [Anonymous], 2004, THESIS
  • [6] [Anonymous], MACHINE LEARNING CHA
  • [7] [Anonymous], 1997, P ASIL ANN M
  • [8] [Anonymous], 1999, P 37 ANN M ASS COMP, DOI DOI 10.3115/1034678.1034693
  • [9] Barzilay R, 2001, 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P50
  • [10] Chklovski T., 2004, P EMPIRICAL METHODS, P33