Off-line compression by greedy textual substitution

被引:52
作者
Apostolico, A [1 ]
Lonardi, S [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
基金
英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
augmented suffix tree; compression of biological sequences; dynamic text compression; grammatical inference; off-line textual substitution; substring statistics;
D O I
10.1109/5.892709
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long textstring x, a substring w is identifies such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted textstring until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments.
引用
收藏
页码:1733 / 1744
页数:12
相关论文
共 45 条
[1]   MINIMUM MESSAGE LENGTH ENCODING AND THE COMPARISON OF MACROMOLECULES [J].
ALLISON, L ;
YEE, CN .
BULLETIN OF MATHEMATICAL BIOLOGY, 1990, 52 (03) :431-453
[2]  
ALLISON L, 1998, INTELLIGENT SYSTEMS, P8
[3]  
[Anonymous], 1988, DATA COMPRESSION MET
[4]   ROBUST TRANSMISSION OF UNBOUNDED STRINGS USING FIBONACCI REPRESENTATIONS [J].
APOSTOLICO, A ;
FRAENKEL, AS .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1987, 33 (02) :238-245
[5]   SELF-ALIGNMENTS IN WORDS AND THEIR APPLICATIONS [J].
APOSTOLICO, A ;
SZPANKOWSKI, W .
JOURNAL OF ALGORITHMS, 1992, 13 (03) :446-467
[6]  
Apostolico A, 1996, ALGORITHMICA, V15, P481, DOI 10.1007/BF01955046
[7]  
APOSTOLICO A, 1983, P 21 ALL C COMM CONT, P70
[8]  
Bell T. C., 1990, TEXT COMPRESSION
[9]   Data compression using long common strings [J].
Bentley, J ;
McIlroy, D .
DCC '99 - DATA COMPRESSION CONFERENCE, PROCEEDINGS, 1999, :287-295
[10]   EFFICIENT PARALLEL ALGORITHMS TO TEST SQUARE-FREENESS AND FACTORIZE STRINGS [J].
CROCHEMORE, M ;
RYTTER, W .
INFORMATION PROCESSING LETTERS, 1991, 38 (02) :57-60