Using complex networks to quantify consistency in the use of words

被引:13
作者
Amancio, D. R. [1 ]
Oliveira, O. N., Jr. [1 ]
da F Costa, L. [1 ]
机构
[1] Univ Sao Paulo, Inst Phys Sao Carlos, BR-13560970 Sao Carlos, SP, Brazil
来源
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT | 2012年
基金
巴西圣保罗研究基金会;
关键词
data mining (experiment); pattern formation (experiment); random graphs; networks; communication; supply and information networks; LANGUAGE; CHARACTER; BIGRAMS; LENGTH; WORLD;
D O I
10.1088/1742-5468/2012/01/P01004
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
In this paper we have quantified the consistency of word usage in written texts represented by complex networks, where words were taken as nodes, by measuring the degree of preservation of the node neighborhood. Words were considered highly consistent if the authors used them with the same neighborhood. When ranked according to the consistency of use, the words obeyed a log-normal distribution, in contrast to Zipf's law that applies to the frequency of use. Consistency correlated positively with the familiarity and frequency of use, and negatively with ambiguity and age of acquisition. An inspection of some highly consistent words confirmed that they are used in very limited semantic contexts. A comparison of consistency indices for eight authors indicated that these indices may be employed for author recognition. Indeed, as expected, authors of novels could be distinguished from those who wrote scientific texts. Our analysis demonstrated the suitability of the consistency indices, which can now be applied in other tasks, such as emotion recognition.
引用
收藏
页数:20
相关论文
共 61 条
[1]   Using metrics from complex networks to evaluate machine translation [J].
Amancio, D. R. ;
Nunes, M. G. V. ;
Oliveira, O. N., Jr. ;
Pardo, T. A. S. ;
Antiqueira, L. ;
Costa, L. da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2011, 390 (01) :131-142
[2]   Complex networks analysis of manual and machine translations [J].
Amancio, Diego R. ;
Antiqueira, Lucas ;
Pardo, Thiago A. S. ;
Costa, Luciano da F. ;
Oliveira, Osvaldo N., Jr. ;
Nunes, Maria G. V. .
INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2008, 19 (04) :583-598
[3]   Comparing intermittency and network measurements of words and their dependence on authorship [J].
Amancio, Diego Raphael ;
Altmann, Eduardo G. ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da Fontoura .
NEW JOURNAL OF PHYSICS, 2011, 13
[4]  
[Anonymous], P ECML 98 10 EUR C M
[5]  
[Anonymous], CONT PHYS
[6]  
[Anonymous], P EMP METH NAT LANG
[7]  
[Anonymous], FOUND TRENDS INFORM
[8]  
[Anonymous], IAAI P 21 C INN APPL
[9]  
[Anonymous], 1973, COMPUTER LIT STUDIES
[10]  
[Anonymous], STUDIES CLASSIFICATI