A critique and improvement of an evaluation metric for text segmentation

被引:184
作者
Pevzner, L
Hearst, MA
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
10.1162/089120102317341756
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The P-k evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of the metric finds several problems: the metric penalizes false negatives more heavily than false positives, overpenalizes near misses, and is affected by variation in segment size distribution. We propose a simple modification to the P-k metric that remedies these problems. This new metric-called WindowDiff-moves a fixed-sized window across the text and penalizes the algorithm whenever the number of boundaries within the window does not match the true number of boundaries for that window of text.
引用
收藏
页码:19 / 36
页数:18
相关论文
共 38 条
[1]  
Allan J., 1998, P DARPA BROADCAST NE, P194
[2]  
[Anonymous], P 6 WORKSH VER LARG
[3]  
[Anonymous], 1998, SULTRY980701 U SYDN
[4]  
BAEZAYATES RA, 1999, MODERN INFORMATION R
[5]  
BARZILAY R, 1997, P ACL INT SCAL TEXT
[6]   Statistical models for text segmentation [J].
Beeferman, D ;
Berger, A ;
Lafferty, J .
MACHINE LEARNING, 1999, 34 (1-3) :177-210
[7]  
Beeferman D., 1997, P 2 C EMP METH NAT L, P35
[8]   A NOTE ON THE GENERATION OF RANDOM NORMAL DEVIATES [J].
BOX, GEP ;
MULLER, ME .
ANNALS OF MATHEMATICAL STATISTICS, 1958, 29 (02) :610-611
[9]  
Callan J. P., 1994, SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, P302
[10]  
Choi FYY, 2000, 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, pA26