A corpus-based study on random textual vocabulary coverage

被引:11
作者
Fan Fengxiang [1 ]
机构
[1] Dalian Maritime Univ, Dalian, Peoples R China
关键词
corpus; domain; random textual vocabulary coverage; Brunet's model;
D O I
10.1515/CLLT.2008.001
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper investigates the random textual vocabulary coverage of the nine domains in the BNC using sets of five hundred 2000-word samples randomly drawn from each of the domains. Random textual vocabulary coverage is the coverage by the vocabulary of a text or a collection of texts of any given length over that of another text or a collection of texts of any given length. The estimation of random textual vocabulary coverage depends on the determination of the relationship between vocabulary size and text length. Brunet's model proves to be robust in capturing such relationship. A mathematical estimator for random textual vocabulary coverage is developed incorporating Brunet's model. A method for computing the 95% random textual vocabulary coverage interval is also devised.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 18 条
[1]  
[Anonymous], 1998, NEW PARADIGM HUMAN S
[2]  
[Anonymous], 1992, READING FOREIGN LANG
[3]  
[Anonymous], RELC J
[4]  
Brunet E., 1978, VOCABULAIRE J GIRAUD
[5]   FALLACY OF WORD-COUNTS [J].
ENGELS, LK .
IRAL-INTERNATIONAL REVIEW OF APPLIED LINGUISTICS IN LANGUAGE TEACHING, 1968, 6 (03) :213-231
[6]   Foot massage in Chinese medical history [J].
Fan, KW .
JOURNAL OF ALTERNATIVE AND COMPLEMENTARY MEDICINE, 2006, 12 (01) :1-3
[7]   Exploring variability within and between corpora: some methodological considerations [J].
Gries, Stefan Th. .
CORPORA, 2006, 1 (02) :109-151
[8]  
Guiraud P., 1954, Les caracteres statistiques du vocabulaire: essai de methodologie
[9]  
Harald Baayen R., 2001, WORD FREQUENCY DISTR
[10]  
Heaps H. S., 1978, Information Retrieval: Computational and Theoretical Aspects