Unsupervised word sense disambiguation with N-gram features

被引:10
作者
Preotiuc-Pietro, Daniel [1 ]
Hristea, Florentina [2 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Bucharest, Dept Comp Sci, Bucharest 010014, Romania
关键词
Bayesian classification; The EM algorithm; Word sense disambiguation; Unsupervised disambiguation; Web-scale N-grams;
D O I
10.1007/s10462-011-9306-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The present paper concentrates on the issue of feature selection for unsupervised word sense disambiguation (WSD) performed with an underlying Na < ve Bayes model. It introduces web N-gram features which, to our knowledge, are used for the first time in unsupervised WSD. While creating features from unlabeled data, we are "helping" a simple, basic knowledge-lean disambiguation algorithm to significantly increase its accuracy as a result of receiving easily obtainable knowledge. The performance of this method is compared to that of others that rely on completely different feature sets. Test results concerning nouns, adjectives and verbs show that web N-gram feature selection is a reliable alternative to previously existing approaches, provided that a "quality list" of features, adapted to the part of speech, is used.
引用
收藏
页码:241 / 260
页数:20
相关论文
共 26 条
[1]  
[Anonymous], TEXT SPEECH LANGUAGE
[2]  
Banerjee S., 2002, Computational Linguistics and Intelligent Text Processing. Third International Conference, CICLing 2002. Proceedings (Lecture Notes in Computer Science Vol.2276), P136
[3]  
Banerjee S., 2003, P 18 INT JOINT C ART, P805
[4]  
Bergsma S, 2010, ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, P865
[5]  
Bergsma S, 2009, 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, P1507
[6]  
Brants T, 2006, WEB 1T 5 GRAM VERSIO, DOI [10.35111/cqpa-a498, DOI 10.35111/CQPA-A498]
[7]  
Brants T, 2009, TECHNICAL REPORT
[8]  
Bruce R, 1996, CORR
[9]  
Chang C.-Y., 2010, HUMAN LANGUAGE TECHN, P591
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38