An explanation of the effectiveness of latent semantic indexing by means of a Bayesian regression model

被引:23
作者
Story, RE
机构
[1] Bell Commun. Research (Bellcore), Red Bank, NJ 07701-5699
关键词
D O I
10.1016/0306-4573(95)00055-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Semantic Indexing (LSI) is an effective automated method for determining if a document is relevant to a reader based on a few words or an abstract describing the reader's needs. A particular feature of LSI is its ability to deal automatically with synonyms. LSI generally is explained in terms of a mathematical concept called the Singular Value Decomposition and statistical methods such as factor analysis. This paper looks at LSI from a different perspective, comparing it to statistical regression and Bayesian methods. The relationships found can be useful in explaining the performance of LSI and in suggesting variations on the LSI approach. (C) 1996 Elsevier Science Ltd
引用
收藏
页码:329 / 344
页数:16
相关论文
共 17 条
[11]  
KANEESRIG Y, 1990, THESIS CORNELL U ITH
[12]  
KANEESRIG Y, 1992, P 4 INT C COMP INF I
[13]  
MALINVAUD E, 1970, STUDIES MATH MANAGER, V6, P374
[14]  
Rao C. R., 1971, GEN INVERSE MATRICES, V135, P197
[15]   TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL [J].
SALTON, G ;
BUCKLEY, C .
INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) :513-523
[16]  
SALTON G, 1990, J AM SOC INFORM SCI, V41, P288, DOI 10.1002/(SICI)1097-4571(199006)41:4<288::AID-ASI8>3.0.CO
[17]  
2-H