A SYSTEMATIC-APPROACH TO COMPRESSING A FULL-TEXT RETRIEVAL-SYSTEM

被引:16
作者
BOOKSTEIN, A [1 ]
KLEIN, ST [1 ]
ZIFF, DA [1 ]
机构
[1] BAR ILAN UNIV,DEPT MATH & COMP SCI,IL-52900 RAMAT GAN,ISRAEL
关键词
D O I
10.1016/0306-4573(92)90069-C
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article reports on a variety of compression algorithms developed in the context of a project to put all the data files for a full-text retrieval system on CD-ROM. In the context of inexpensive pre-processing, a text-compression algorithm is presented that is based on Markov-modeled Huffman coding on an extended alphabet. Data structures are examined for facilitating random access into the compressed text. In addition, new algorithms are presented for compression of word indices, both the dictionaries (word lists) and the text pointers (concordances). The ARTFL database is used as a test case throughout the article.
引用
收藏
页码:795 / 806
页数:12
相关论文
共 15 条
[1]  
BELL T, 1989, COMPUT SURV, V21, P557, DOI 10.1145/76894.76896
[2]   USING BITMAPS FOR MEDIUM-SIZED INFORMATION-RETRIEVAL SYSTEMS [J].
BOOKSTEIN, A ;
KLEIN, ST .
INFORMATION PROCESSING & MANAGEMENT, 1990, 26 (04) :525-533
[3]   COMPRESSION, INFORMATION-THEORY, AND GRAMMARS - A UNIFIED APPROACH [J].
BOOKSTEIN, A ;
KLEIN, ST .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1990, 8 (01) :27-49
[4]  
BOOKSTEIN A, 1991, RIAO C BARCELONA SPA
[5]  
BOOKSTEIN A, 1990, COMP ANAL MODELS CHR
[6]   PROCESSING TRUNCATED TERMS IN DOCUMENT-RETRIEVAL SYSTEMS [J].
BRATLEY, P ;
CHOUEKA, Y .
INFORMATION PROCESSING & MANAGEMENT, 1982, 18 (05) :257-266
[7]  
CHOUEKA Y, 1988, 11TH ACM SIGIR C GRE
[8]  
CHOUEKA Y, 1987, 10TH ACM SIGIR C NEW
[9]  
CICHOCKI EM, 1988, J AM SOC INFORM SCI, V39, P43, DOI 10.1002/(SICI)1097-4571(198801)39:1<43::AID-ASI15>3.0.CO
[10]  
2-M