AN ALGEBRA FOR STRUCTURED TEXT SEARCH AND A FRAMEWORK FOR ITS IMPLEMENTATION

被引:53
作者
CLARKE, CLA
CORMACK, GV
BURKOWSKI, FJ
机构
[1] Department of Computer Science, University of Waterloo, Waterloo
关键词
D O I
10.1093/comjnl/38.1.43
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A query algebra is presented that expresses searches on structured text. In addition to traditional full-text boolean queries that search a pre-defined collection of documents, the algebra permits queries that harness document structure. The algebra manipulates arbitrary intervals of text, which are recognized in the text from implicit or explicit markup. The algebra has seven operators, which combined intervals to yield new ones: containing, not containing, contained in, not contained in, one of, both of, followed by. The ultimate result of a query is the set of intervals that satisfy it. An implementation framework is given based on four primitive access functions. Each access function finds the solution to a query nearest to a given position in the database. Recursive definitions for the seven operators are given in terms of these access functions. Search time is at worst proportional to the time required to evaluate the access functions for occurrences of the elementary terms in a query.
引用
收藏
页码:43 / 56
页数:14
相关论文
共 31 条
[1]   QUERY-PROCESSING IN A MULTI-MEDIA DOCUMENT SYSTEM [J].
BERTINO, E ;
RABITTI, F ;
GIBBS, S .
ACM TRANSACTIONS ON OFFICE INFORMATION SYSTEMS, 1988, 6 (01) :1-41
[2]   SHORTENING THE OED - EXPERIENCE WITH A GRAMMAR-DEFINED DATABASE [J].
BLAKE, GE ;
BRAY, T ;
TOMPA, FWM .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1992, 10 (03) :213-232
[3]  
BLAKE GE, 1994, 1ST P INT C APPL DAT, P267
[4]  
BRYAN M, 1988, SGML AUTHORS GUIDE S
[5]   AN ALGEBRA FOR HIERARCHICALLY ORGANIZED TEXT-DOMINATED DATABASES [J].
BURKOWSKI, FJ .
INFORMATION PROCESSING & MANAGEMENT, 1992, 28 (03) :333-348
[6]  
BURKOWSKI FJ, 1990, 13TH P INT ACM SIGIR, P211
[7]  
CHRISTOPHIDES V, 1994, P ACM SIGMOD INT C M, P313
[8]  
CLARKE CLA, 1995, CS9439 U WAT COMP SC
[9]  
COLBY LS, 1989, 282 IND U COMP SCI D
[10]  
FAWCETT H, 1989, TEXT SEARCHING SYSTE