Querying documents in object databases

被引:315
作者
Abiteboul S. [1 ]
Cluet S. [1 ]
Christophides V. [1 ]
Milo T. [2 ]
Moerkotte G. [3 ]
Siméon J. [1 ]
机构
[1] INRIA-Rocquencourt, F-78153 Le Chesnay Cedex
[2] Tel Aviv University, Ramat Aviv
[3] Lehrstuhl für Praktische Informatik III, Seminargebäude A5, Universität Mannheim
关键词
Generalized path expressions; ODMG; OQL; Pattern matching;
D O I
10.1007/s007990050001
中图分类号
学科分类号
摘要
We consider the problem of storing and accessing documents (SGML and HTML, in particular) using database technology. To specify the database image of documents, we use structuring schemas that consist in grammars annotated with database programs. To query documents, we introduce an extension of OQL, the ODMG standard query language for object databases. Our extension (named OQL-doc) allows us to query documents without a precise knowledge of their structure using in particular generalized path expressions and pattern matching. This allows us to introduce in a declarative language (in the style of SQL or OQL), navigational and information retrieval styles of accessing data. Query processing in the context of documents and path expressions leads to challenging implementation issues. We extend an object algebra with new operators to deal with generalized path expressions. We then consider two essential complementary optimization techniques. We show that almost standard database optimization techniques can be used to answer queries without having to load the entire document into the database. We also consider the interaction of full-text indexes (e.g., inverted files) with standard database collection indexes (e.g., B-trees) that provide important speed-up. © Springer-Verlag 1997.
引用
收藏
页码:5 / 19
页数:14
相关论文
共 42 条
  • [21] Graham I., Html Documentation and Style Guide, (1994)
  • [22] Herr L., O <sup>2</sup>Yacc. Rapport de Stage de Matrise ENS, September 1992 ISO 8879, Information Processing - Text and O Ce Systems - Standard Generalized Markup Language (SGML), (1986)
  • [23] Standard 8824, Information Processing System, (1987)
  • [24] Kifer M., Kim W., Sagiv Y., Querying object-oriented databases, Proceedings of the ACM SIGMOD International Conference On Management of Data, pp. 393-402, (1992)
  • [25] Konopnicki D., Shmueli O., W3QS: A query system for the World Wide Web, Proceedings of the Twenty First International Conference On Very Large Data Bases, pp. 54-65, (1995)
  • [26] Lamport L., LATEX: A Document Preparation System, (1994)
  • [27] Lakshmanan L.V.S., Sadri F., Subramanian I.N., A declarative language for querying and restructuring the Web, Proc. 6th. International Workshop On Research Issues In Data Engineering, (1996)
  • [28] Lebas L., Writer, (1995)
  • [29] Mendelzohn A., Mihaila G.A., Milo T., Querying the World Wide Web, (1996)
  • [30] Mendelzon A.O., Wood P.T., Finding regular simple paths in graph databases, SIAM J. Comput, (1995)