Information retrieval on Turkish texts

被引：58

作者：

Can, Fazli ^{[1
]}

Kocberber, Seyit ^{[1
]}

Balcik, Erman ^{[1
]}

Kaynak, Cihan ^{[1
]}

Ocalan, H. Cagdas ^{[1
]}

Vursavas, Onur M. ^{[1
]}

机构：

[1] Bilkent Univ, Dept Comp Engn, Bilkent Informat Retrieval Grp, TR-06800 Ankara, Turkey

来源：

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY | 2008年 / 59卷 / 03期

关键词：

D O I：

10.1002/asi.20750

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this study, we investigate information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language-dependent corpus statistics, and an elaborate lemmatizer-based stemmer provide similar retrieval effectiveness in Turkish IR. We investigate the effects of a range of search conditions on the retrieval performance; these include scalability issues, query and document length effects, and the use of stopword list in indexing.

引用

页码：407 / 421

页数：15

共 62 条

[31] To stem or lemmatize a highly inflectional language in a probabilistic IR environment? [J].

Kettunen, K ;

Kunttu, T ;

Järvelin, K .

JOURNAL OF DOCUMENTATION, 2005, 61 (04) :476-496

[32]

Koksal A., 1981, P BIL 80 BILD ANK, P37

[33]

KRROVETZ R, 1993, P 16 INT C RES DEV I, P191

[34]

LARKEY GL, 1988, TURKISH GRAMMAR

[35] Document ranking and the vector-space model [J].

Lee, DL ;

Chuang, H ;

Seamons, K .

IEEE SOFTWARE, 1997, 14 (02) :67-75

[36]

Long Xiaohui, 2003, P 29 INT C VER LARG, P129

[37] Character N-gram tokenization for European language text retrieval [J].

McNamee, P ;

Mayfield, J .

INFORMATION RETRIEVAL, 2004, 7 (1-2) :73-97

[38]

*NTCIR, 2007, NII TEST COLL IR SYS

[39]

Oflazer K., 1994, Literary & Linguistic Computing, V9, P137, DOI 10.1093/llc/9.2.137

[40]

Pembe FC, 2004, LECT NOTES COMPUT SC, V3280, P741

← 1 2 3 4 5 6 7 →