PARTIAL-MATCH RETRIEVAL VIA THE METHOD OF SUPERIMPOSED CODES

被引:102
作者
ROBERTS, CS
机构
[1] Bell Laboratories, Holmdel
关键词
D O I
10.1109/PROC.1979.11543
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents and analyzes an effective and practical method of accomplishing partial-match retrieval on a computer file containing a large number of information records. In partial-match retrieval a subset of the records in the file is selected and retrieved by specifying a query set consisting of a small number of key values; the records selected and retrieved are those having a match to all the key values in the query set. Partial-match retrieval can be a powerful capability when used in information retrieval systems, and a stored file augmented with such a capability is equivalent to an associative or content-addressable store. The method presented in this paper is based upon the use of superimposed codes, a technique used previously with some success in notched-edge card filing systems in which card selection was accomplished mechanically with long metal needles. A new algorithm is presented for generating superimposed codes without the use of a stored code dictionary. The new algorithm executes rapidly on a digital computer and employs a hash function and a pseudorandom number generator; it allows the generation of binary codes having any desired width and weight. It is pointed out that the use of variable-weight binary codes gives an advantage on files in which the key values occur with substantially different frequencies. It is shown that organizing the bits of the superimposed code words properly in storage leads to the property that only a small fraction of these bits need be retrieved and processed on each query; this substantially reduces the time required to execute a query. Also, because of the simplicity of the bit operations required in the processing of superimposed code words, a very fast query processor could be designed and built with currently available digital hardware devices and techniques. With the query processor implemented in such digital hardware, partial-match query rates as high as 100/s or even higher appear to be feasible. In order to demonstrate experimentally the power of the method of superimposed codes, the method was implemented in software on a file consisting of the Suffolk County, NY, telephone-directory listings. Copyright © 1979 by The Institute of Electrical and Electronics Engineers, Inc.
引用
收藏
页码:1624 / 1642
页数:19
相关论文
共 44 条
[1]  
BERLEKAMP ER, 1968, ALGEBRAIC CODING THE, P6
[2]  
BEYER JD, UNPUBLISHED
[3]   SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS [J].
BLOOM, BH .
COMMUNICATIONS OF THE ACM, 1970, 13 (07) :422-&
[4]  
BLOOM BH, 1969, P ACM NAT C, V24, P83
[5]  
BOURNE CP, 1963, METHODS INFORMATION
[6]  
Brenner C. W., 1958, PUNCHED CARDS, P340
[7]  
Burkhard W. A., 1976, ACM Transactions on Database Systems, V1, P175, DOI 10.1145/320455.320469
[8]  
CASEY RS, 1958, PUNCHED CARDS THEIR
[9]   FOURIER ANALYSIS OF UNIFORM RANDOM NUMBER GENERATORS [J].
COVEYOU, RR ;
MACPHERSON, RD .
JOURNAL OF THE ACM, 1967, 14 (01) :100-+
[10]   SECONDARY KEY RETRIEVAL USING AN IBM 7090-1301 SYSTEM [J].
DAVIS, DR ;
LIN, AD .
COMMUNICATIONS OF THE ACM, 1965, 8 (04) :243-&