Automatic Language Classification by means of Syntactic Dependency Networks

被引:39
作者
Abramov, Olga [1 ]
Mehler, Alexander [2 ]
机构
[1] Univ Bielefeld, Fac Linguist & Literature, D-33615 Bielefeld, Germany
[2] Goethe Univ Frankfurt, Frankfurt, Germany
关键词
D O I
10.1080/09296174.2011.608602
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This article presents an approach to automatic language classification by means of linguistic networks. Networks of 11 languages were constructed from dependency treebanks, and the topology of these networks serves as input to the classification algorithm. The results match the genealogical similarities of these languages. In addition, we test two alternative approaches to automatic language classification - one based on n-grams and the other on quantitative typological indices. All three methods show good results in identifying genealogical groups. Beyond genetic similarities, network features (and feature combinations) offer a new source of typological information about languages. This information can contribute to a better understanding of the interplay of single linguistic phenomena observed in language.
引用
收藏
页码:291 / 336
页数:46
相关论文
共 65 条
[1]  
Ahmed B., 2004, P CSIS RES DAY
[2]  
Alava M., 2004, CONDENSED MATTER
[3]  
Altmann G., 1973, ALLGEMEINE SPRACHTYP
[4]  
Altmann G., 2005, ERLKONIG MATH
[5]  
Andreev N. D., 1967, STAT KOMBINATORNYE M
[6]  
[Anonymous], 2006, TEXT MINING HDB ADV
[7]  
[Anonymous], APPROACHES LANGUAGE
[8]  
[Anonymous], 2005, Journal of Graph Algorithms and Applications, DOI DOI 10.7155/JGAA.00108
[9]  
[Anonymous], 1995, P 3 INT C STAT ANAL
[10]  
Anttila Raimo., 1972, INTRO HIST COMP LING