A similarity-based probability model for latent semantic indexing

被引：23

作者：

Ding, CHQ ^{[1
]}

机构：

[1] Univ Calif Berkeley, Lawrence Berkeley Lab, NERSC Div, Berkeley, CA 94720 USA

来源：

SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 1999年

关键词：

D O I：

10.1145/312624.312652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A dual probability model is constructed for the Latent Semantic Indexing (LSI) using the cosine similarity measure. Both the document-document similarity matrix and the term-term similarity matrix naturally arise from the maximum likelihood estimation of the model parameters, and the optimal solutions are the latent semantic vectors of of LSI. Dimensionality reduction is justified by the statistical significance of latent semantic vectors as measured by the likelihood of the model. This leads to a statistical criterion for the optimal semantic dimensions, answering a critical open question in LSI with practical importance. Thus the model establishes a statistical framework for LSI. Ambiguities related to statistical modeling of LSI are clarified.

引用

页码：58 / 65

页数：8

共 15 条

[1] BARTELL BT, 1995, J AM SOC INFORM SCI, V46, P251
[2] Using linear algebra for intelligent information retrieval
Berry, MW
Dumais, ST
OBrien, GW
[J]. SIAM REVIEW, 1995, 37 (04) : 573 - 595
[3] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[4] 2-9
[5] IMPROVING THE RETRIEVAL OF INFORMATION FROM EXTERNAL SOURCES
DUMAIS, ST
[J]. BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 1991, 23 (02): : 229 - 236
[6] DUMAIS ST, 1995, OVERVIEW TREC 3
[7] PROBABILISTIC MODELS IN INFORMATION-RETRIEVAL
FUHR, N
[J]. COMPUTER JOURNAL, 1992, 35 (03) : 243 - 255
[8] PAPADIMITRIOU CH, 1998, P S PRINC DAT SYST P
[9] TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL
SALTON, G
BUCKLEY, C
[J]. INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) : 513 - 523
[10] SALTON G, 1983, INTRO MODERN INFORMA

← 1 2 →