A dynamic programming algorithm for linear text segmentation

被引:54
作者
Fragkou, P
Petridis, V
Kehagias, A
机构
关键词
text segmentation; information retrieval; document retrieval; machine learning;
D O I
10.1023/B:JIIS.0000039534.65423.00
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we introduce a dynamic programming algorithm which performs linear text segmentation by global minimization of a segmentation cost function which incorporates two factors: (a)within-segment word similarity and (b) prior information about segment length. We evaluate segmentation accuracy of the algorithm by precision, recall and Beeferman's segmentation metric. On a segmentation task which involves Choi's text collection, the algorithm achieves the best segmentation accuracy so far reported in the literature. The algorithm also achieves high accuracy on a second task which involves previously unused texts.
引用
收藏
页码:179 / 197
页数:19
相关论文
共 39 条
[1]  
[Anonymous], 1977, ROGETS INT THESAURUS
[2]  
[Anonymous], P 6 WORKSH VER LARG
[3]  
[Anonymous], 1998, THESIS U PENNSYLVANI
[4]  
[Anonymous], 1993, P 6 C EUROPEAN CHAPT
[5]  
[Anonymous], 1996, P 19 ANN INT ACM SIG, DOI DOI 10.1145/243199.243202
[6]  
[Anonymous], P 31 ANN M ASS COMP, DOI DOI 10.1016/S0306-4573(02)00035-3
[7]  
[Anonymous], 2000, P 1 N AM CHAPTER ASS
[8]   Statistical models for text segmentation [J].
Beeferman, D ;
Berger, A ;
Lafferty, J .
MACHINE LEARNING, 1999, 34 (1-3) :177-210
[9]  
Beeferman D, 1997, 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P373
[10]  
Beeferman D., 1997, P 2 C EMP METH NAT L, P35