Automatic summarization of open-domain multiparty dialogues in diverse genres

被引:65
作者
Zechner, K [1 ]
机构
[1] Educ Testing Serv, Princeton, NJ 08541 USA
关键词
D O I
10.1162/089120102762671945
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Automatic summarization of open-domain spoken dialogues is a relatively new research area. This article introduces the task and the challenges involved and motivates and presents an approach for obtaining automatic-extract summaries for human transcripts of multiparty dialogues of four different genres, without any restriction on domain. We address the following issues, which are intrinsic to spoken-dialogue summarization and typically can be ignored when summarizing written text such as news wire data: (1) detection and removal of speech disfluencies; (2) detection and insertion of sentence boundaries; and (3) detection and linking of cross-speaker information units (question-answer pairs). A system evaluation is performed using a corpus of 23 dialogue excerpts with an average duration of about 10 minutes, comprising 80 topical segments and about 47,000 words total. The corpus was manually annotated for relevant text spans by six human annotators. The global evaluation shows that for the two more informal genres, our summarization system using dialogue-specific components significantly outperforms two baselines: (1) a maximum-marginal-relevance ranking algorithm using TF*IDF term weighting, and (2) a LEAD baseline that extracts the first n words from a text.
引用
收藏
页码:447 / 485
页数:39
相关论文
共 64 条
[1]
ALEXANDERSSON J, 1998, P INLG 98 NIAG LAK C
[2]
[Anonymous], 1980, CONTENT ANAL
[3]
[Anonymous], 1998, THESIS U BIRMINGHAM
[4]
[Anonymous], P AAAI 94
[5]
AONE C, 1997, ACL EACL 97 WORKSH I
[6]
ARONS B, 1994, P ICSLP 94, P1931
[7]
BERGER AL, 2000, P 23 ACM SIGIR C
[8]
BETT M, 2000, P C CONT BAS MULT IN
[9]
CARBONELL J, 1997, P IJCAI 97 WORKSH AI
[10]
Carletta J, 1997, COMPUT LINGUIST, V23, P13