Using linear algebra for intelligent information retrieval

被引:740
作者
Berry, MW [1 ]
Dumais, ST [1 ]
OBrien, GW [1 ]
机构
[1] BELLCORE,INFORMAT SCI RES GRP,MORRISTOWN,NJ 07962
关键词
indexing; information; latent; matrices; retrieval; semantic; singular value decomposition; sparse; updating;
D O I
10.1137/1037127
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method latent semantic indexing (LST) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely automatic yet intelligent indexing method, widely applicable, and a promising way to improve users access to many kinds of textual materials, or to documents and services for which textual descriptions are available. A survey of the computational requirements for managing LSI-encoded databases as well as current and future applications of LSI is presented.
引用
收藏
页码:573 / 595
页数:23
相关论文
共 31 条
  • [1] INFORMATION FILTERING AND INFORMATION-RETRIEVAL - 2 SIDES OF THE SAME COIN
    BELKIN, NJ
    CROFT, WB
    [J]. COMMUNICATIONS OF THE ACM, 1992, 35 (12) : 29 - 38
  • [2] BERRY M, 1993, SVDPACKC VERSION 1 0
  • [3] LARGE-SCALE SPARSE SINGULAR VALUE COMPUTATIONS
    BERRY, MW
    [J]. INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1992, 6 (01): : 13 - 49
  • [4] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [5] 2-9
  • [6] DISTRIBUTION OF MATHEMATICAL SOFTWARE VIA ELECTRONIC MAIL
    DONGARRA, JJ
    GROSSE, E
    [J]. COMMUNICATIONS OF THE ACM, 1987, 30 (05) : 403 - 407
  • [7] IMPROVING THE RETRIEVAL OF INFORMATION FROM EXTERNAL SOURCES
    DUMAIS, ST
    [J]. BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 1991, 23 (02): : 229 - 236
  • [8] DUMAIS ST, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P233
  • [9] DUMAIS ST, 1994, 500207 NAT I STAND T, P105
  • [10] DUMAIS ST, 1993, 500207 NAT I STAND T, P137