INDEXING BY LATENT SEMANTIC ANALYSIS

被引:112
作者
DEERWESTER, S
DUMAIS, ST
FURNAS, GW
LANDAUER, TK
HARSHMAN, R
机构
[1] BELL COMMUN RES INC, 445 S ST, MORRISTOWN, NJ 07960 USA
[2] UNIV CHICAGO, CTR INFORMAT & LANGUAGE STUDIES, CHICAGO, IL 60637 USA
[3] UNIV WESTERN ONTARIO, LONDON N6A 3K7, ONTARIO, CANADA
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE | 1990年 / 41卷 / 06期
关键词
D O I
10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher‐order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular‐value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo‐document vectors formed from weighted combinations of terms, and documents with supra‐threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising. © 1990 John Wiley & Sons, Inc. Copyright © 1990 John Wiley & Sons, Inc.
引用
收藏
页码:391 / 407
页数:17
相关论文
共 38 条
[1]  
AMSLER RA, 1984, ANNU REV INFORM SCI, V19, P161
[2]  
ATHERTON P, 1965, AIPDRP651 REPT
[3]   INFORMATION RETRIEVAL BASED UPON LATENT CLASS ANALYSIS [J].
BAKER, FB .
JOURNAL OF THE ACM, 1962, 9 (04) :512-&
[4]  
BATES MJ, 1986, J AM SOC INFORM SCI, V37, P357
[5]   AUTOMATIC DOCUMENT CLASSIFICATION [J].
BORKO, H ;
BERNICK, M .
JOURNAL OF THE ACM, 1963, 10 (02) :151-&
[6]   ANALYSIS OF INDIVIDUAL DIFFERENCES IN MULTIDIMENSIONAL SCALING VIA AN N-WAY GENERALIZATION OF ECKART-YOUNG DECOMPOSITION [J].
CARROLL, JD ;
CHANG, JJ .
PSYCHOMETRIKA, 1970, 35 (03) :283-&
[7]   MULTIDIMENSIONAL-SCALING [J].
CARROLL, JD ;
ARABIE, P .
ANNUAL REVIEW OF PSYCHOLOGY, 1980, 31 :607-649
[8]   DISAMBIGUATION BY SHORT CONTEXTS [J].
CHOUEKA, Y ;
LUSIGNAN, S .
COMPUTERS AND THE HUMANITIES, 1985, 19 (03) :147-157
[9]  
Coombs Clyde H., 1964, THEORY DATA
[10]   A LANCZOS-ALGORITHM FOR COMPUTING SINGULAR-VALUES AND VECTORS OF LARGE MATRICES [J].
CULLUM, J ;
WILLOUGHBY, RA ;
LAKE, M .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1983, 4 (02) :197-215