Unsupervised learning of the morphology of a natural language

被引:242
作者
Goldsmith, J [1 ]
机构
[1] Univ Chicago, Dept Linguist, Chicago, IL 60637 USA
关键词
D O I
10.1162/089120101750300490
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly develop a probabilistic morphological grammar, and use MDL as our primary tool to determine whether the modifications proposed by the heuristics will be adopted or not. The resulting grammar matches well the analysis that would be developed by a human morphologist. In the final section, we discuss the relationship of this style of MDL grammatical analysis to the notion of evaluation metric in early generative grammar.
引用
收藏
页码:153 / 198
页数:46
相关论文
共 38 条
  • [1] Altmann Gabriel, 1980, QUANTITATIVE LINGUIS, V7
  • [2] Andreev Nikolai Dmitrievich, 1965, STAT KOMBINATORNOE M
  • [3] [Anonymous], LAST PHONOLOGICAL RU
  • [4] [Anonymous], COMPUT LINGUIST
  • [5] [Anonymous], P ACL 99 WORKSH UNS
  • [6] BARONI M, 2000, ANN M LING SOC AM CH
  • [7] Bloomfield Leonard, 1933, Language
  • [8] BRENT M, 1993, PROCEEDINGS OF THE FIFTEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, P28
  • [9] CHOMSKY N, 1975, LOGICAL STRUCTURE LI
  • [10] Chomsky Noam, 1957, SYNTACTIC STRUCTURES