An explanation of the effectiveness of latent semantic indexing by means of a Bayesian regression model

被引:23
作者
Story, RE
机构
[1] Bell Commun. Research (Bellcore), Red Bank, NJ 07701-5699
关键词
D O I
10.1016/0306-4573(95)00055-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Semantic Indexing (LSI) is an effective automated method for determining if a document is relevant to a reader based on a few words or an abstract describing the reader's needs. A particular feature of LSI is its ability to deal automatically with synonyms. LSI generally is explained in terms of a mathematical concept called the Singular Value Decomposition and statistical methods such as factor analysis. This paper looks at LSI from a different perspective, comparing it to statistical regression and Bayesian methods. The relationships found can be useful in explaining the performance of LSI and in suggesting variations on the LSI approach. (C) 1996 Elsevier Science Ltd
引用
收藏
页码:329 / 344
页数:16
相关论文
共 17 条
[1]  
ALBERT A, 1972, REGRESSION MOOREPENR
[2]  
[Anonymous], 1977, COMPUTER METHODS MAT
[3]  
BARTELL BT, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P161
[4]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[5]  
2-9
[6]  
DUMAIS ST, 1990, TMARH017527 BELL COM
[7]  
DUMAIS ST, 1992, TMARH021139 BELL COM
[8]  
Goldberger A. S., 1964, ECONOMETRIC THEORY
[9]  
GOLDSTEIN M, 1974, J R STAT SOC B, V36, P284
[10]  
GRAYBILL FA, 1969, INTRO MATRICES APPLI