Automatic Text Summarization Using Latent Semantic Analysis

被引:21
作者
Mashechkin, I. V. [1 ]
Petrovskiy, M. I. [1 ]
Popov, D. S. [1 ]
Tsarev, D. V. [1 ]
机构
[1] Moscow MV Lomonosov State Univ, Dept Computat Math & Cybernet, Moscow 119991, Russia
关键词
Singular Value Decomposition; Latent Semantic Analysis; Original Text; Nonnegative Matrix Factorization; Model Summary;
D O I
10.1134/S0361768811060041
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.
引用
收藏
页码:299 / 305
页数:7
相关论文
共 20 条
[1]  
[Anonymous], ORNLTM13756
[2]  
[Anonymous], DTU TOOLB
[3]   Algorithms and applications for approximate nonnegative matrix factorization [J].
Berry, Michael W. ;
Browne, Murray ;
Langville, Amy N. ;
Pauca, V. Paul ;
Plemmons, Robert J. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :155-173
[4]  
Berry MichaelW., 1994, USING LINEAR ALGEBRA
[5]  
Garcia E., VECTOR THEORY KEYWOR
[6]  
Garcia E., INFORM RETRIEVAL TUT
[7]  
Gong Y., 2001, SIGIR 2001
[8]  
Jezek K., 2008, Znalosti, P1
[9]   A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge [J].
Landauer, TK ;
Dumais, ST .
PSYCHOLOGICAL REVIEW, 1997, 104 (02) :211-240
[10]   Learning the parts of objects by non-negative matrix factorization [J].
Lee, DD ;
Seung, HS .
NATURE, 1999, 401 (6755) :788-791