Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data

被引:3
作者
Seong-Bae Park
Byoung-Tak Zhang
Yung Taek Kim
机构
[1] Seoul National University,Biointelligence Lab, School of Computer Science and Engineering
来源
Applied Intelligence | 2003年 / 19卷
关键词
word sense disambiguation; learning from unlabeled examples; selective sampling; committee learning; decision tree;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we describe a machine learning approach to word sense disambiguation that uses unlabeled data. Our method is based on selective sampling with committees of decision trees. The committee members are trained on a small set of labeled examples which are then augmented by a large number of unlabeled examples. Using unlabeled examples is important because obtaining labeled data is expensive and time-consuming while it is easy and inexpensive to collect a large number of unlabeled examples. The idea behind this approach is that the labels of unlabeled examples can be estimated by using committees. Using additional unlabeled examples, therefore, improves the performance of word sense disambiguation and minimizes the cost of manual labeling. Effectiveness of this approach was examined on a raw corpus of one million words. Using unlabeled data, we achieved an accuracy improvement up to 20.2%.
引用
收藏
页码:27 / 38
页数:11
相关论文
共 25 条
[1]  
Atsushi F.(1998)Selective sampling of effective example sentence sets for word sense disambiguation Computational Linguistics 24 573-597
[2]  
Kentaro I.(2000)Learning to classify text from labeled and unlabeled documents Machine Learning 39 1-32
[3]  
Takenobu T.(1994)Accelerated learning by active example selection International Journal of Neural Systems 5 67-75
[4]  
Hozumi T.(1999)Genetic programming with active data selection Simulated Evolution and Learning LNAI1585 146-153
[5]  
Nigam K.(1997)Selective sampling using the query by committee algorithm Machine Learning 28 133-168
[6]  
McCallum A.(1996)Error correlation and error reduction in ensemble classifiers Connection Science 8 385-404
[7]  
Thrun S.(1994)The weighted majority algorithm Information and Computation 108 212-261
[8]  
Mitchell T.(1996)Bagging predictors Machine Learning 24 123-140
[9]  
Zhang B.-T.(1997)Decision tree induction based on efficient tree restructuring Machine Learning 29 5-44
[10]  
Zhang B.-T.(1994)Korean analysis using multiple knowledge sources Journal of The Korea Information Science Society 21 1324-1332