Using latent semantic indexing for multilanguage information retrieval

被引:20
作者
Berry, MW [1 ]
Young, PG [1 ]
机构
[1] UNIV TENNESSEE,DEPT COMP SCI,KNOXVILLE,TN 37920
来源
COMPUTERS AND THE HUMANITIES | 1995年 / 29卷 / 06期
关键词
Bible; English; Gospels; Greek; Hebrew; information retrieval; latent semantic indexing; singular value decomposition;
D O I
10.1007/BF01829874
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, a method for indexing cross-language databases for conceptual query matching is presented. Two languages (Greek and English) are combined by appending a small portion of documents from one language to the identical documents in the other language. The proposed merging strategy duplicates less than 7% of the entire database (made up of different translations of the Gospels). Previous strategies duplicated up to 34% of the initial database in order to perform the merger. The proposed method retrieves a larger number of relevant documents for both languages with higher cosine rankings when Latent Semantic Indexing (LSI) is employed. Using the proposed merge strategies, LSI is shown to be effective in retrieving documents from either language (Greek or English) without requiring any translation of a user's query. An effective Bible search product needs to allow the use of natural language for searching (queries), LSI enables the user to form queries with using natural expressions in the user's own native language. The merging strategy proposed in this study enables LSI to retrieve relevant documents effectively using a minimum of the database in a foreign language.
引用
收藏
页码:413 / 429
页数:17
相关论文
共 15 条
[1]  
Barker Kenneth, 1985, NEW INT STUDY BIBLE
[2]  
BERRY M, 1995, IN PRESS SIAM REV
[3]  
Berry M. W., 1995, 7TH P SIAM C PAR PRO, P39
[4]   LARGE-SCALE SPARSE SINGULAR VALUE COMPUTATIONS [J].
BERRY, MW .
INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1992, 6 (01) :13-49
[5]  
BERRY MW, 1992, CS92159 U TENN TECHN
[6]  
BERRY MW, 1993, CS93194 U TENN TECHN
[7]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P1
[8]   IMPROVING THE RETRIEVAL OF INFORMATION FROM EXTERNAL SOURCES [J].
DUMAIS, ST .
BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 1991, 23 (02) :229-236
[9]  
Golub G.H., 1996, MATH GAZ, VThird
[10]  
HEWITT S, 1993, CHRISTIAN COMPUTING, V5, P14