Integrating prosodic and lexical cues for automatic topic segmentation

被引:52
作者
Tür, G
Hakkani-Tür, D
Stolcke, A
Shriberg, E
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06533 Ankara, Turkey
[2] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA
关键词
D O I
10.1162/089120101300346796
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.
引用
收藏
页码:31 / 57
页数:27
相关论文
共 47 条
[1]  
Allan J., 1998, P DARPA BROADCAST NE, P194
[2]  
[Anonymous], P 31 ANN M ASS COMP, DOI DOI 10.1016/S0306-4573(02)00035-3
[3]  
[Anonymous], P DARPA BROADC NEWS
[4]  
Ayers G., 1994, WORKING PAPERS LINGU, V44, P1
[5]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[6]   Statistical models for text segmentation [J].
Beeferman, D ;
Berger, A ;
Lafferty, J .
MACHINE LEARNING, 1999, 34 (1-3) :177-210
[7]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[8]   RATE AND PAUSE CHARACTERISTICS OF ORAL READING [J].
BRUBAKER, RS .
JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 1972, 1 (02) :141-147
[9]  
BUNTINE W, 1992, INTRO IND VERSION 2
[10]  
Cieri C., 1999, P DARPA BROADCAST NE, P57