A NEW DISTANCE METRIC ON STRINGS COMPUTABLE IN LINEAR TIME

被引:15
作者
EHRENFEUCHT, A [1 ]
HAUSSLER, D [1 ]
机构
[1] UNIV CALIF SANTA CRUZ,DEPT COMP & INFORMAT SCI,SANTA CRUZ,CA 95064
基金
美国国家科学基金会;
关键词
DATA PROCESSING;
D O I
10.1016/0166-218X(88)90076-5
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We describe a new metric for sequence comparison that emphasizes global similarity over sequential matching at the local level. It has the advantage over the Levenshtein metric that strings of lengths n and m can be compared in time proportional to n plus m instead of nm. Various mathematical properties of the metric are established. These metrics have potential applications in file management, in the calculation of differences and the creation of version trees among sets of files, as well as in text editing.
引用
收藏
页码:191 / 203
页数:13
相关论文
共 9 条
[1]  
BACON MD, 1973, DATA TRANSMISSION
[2]   THE SMALLEST AUTOMATION RECOGNIZING THE SUBWORDS OF A TEXT [J].
BLUMER, A ;
BLUMER, J ;
HAUSSLER, D ;
EHRENFEUCHT, A ;
CHEN, MT ;
SEIFERAS, J .
THEORETICAL COMPUTER SCIENCE, 1985, 40 (01) :31-55
[3]   SEQUENCE LANDSCAPES [J].
CLIFT, B ;
HAUSSLER, D ;
MCCONNELL, R ;
SCHNEIDER, TD ;
STORMO, GD .
NUCLEIC ACIDS RESEARCH, 1986, 14 (01) :141-158
[4]   TECHNIQUE FOR ISOLATING DIFFERENCES BETWEEN FILES [J].
HECKEL, P .
COMMUNICATIONS OF THE ACM, 1978, 21 (04) :264-268
[5]   A FASTER ALGORITHM COMPUTING STRING EDIT DISTANCES [J].
MASEK, WJ ;
PATERSON, MS .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1980, 20 (01) :18-31
[6]   SPACE-ECONOMICAL SUFFIX TREE CONSTRUCTION ALGORITHM [J].
MCCREIGHT, EM .
JOURNAL OF THE ACM, 1976, 23 (02) :262-272
[7]  
Sankoff D., 1983, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
[8]   IDENTIFICATION OF COMMON MOLECULAR SUBSEQUENCES [J].
SMITH, TF ;
WATERMAN, MS .
JOURNAL OF MOLECULAR BIOLOGY, 1981, 147 (01) :195-197
[9]   THE STRING-TO-STRING CORRECTION PROBLEM WITH BLOCK MOVES [J].
TICHY, WF .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1984, 2 (04) :309-321