Subword-based approaches for spoken document retrieval

被引：67

作者：

Ng, K ^{[1
]}

Zue, VW ^{[1
]}

机构：

[1] MIT, Comp Sci Lab, Spoken Language Syst Grp, Cambridge, MA 02139 USA

来源：

SPEECH COMMUNICATION | 2000年 / 32卷 / 03期

关键词：

spoken document retrieval; audio indexing; information retrieval;

D O I：

10.1016/S0167-6393(00)00008-X

中图分类号：

O42 [声学];

学科分类号：

070206 [声学]; 082403 [水声工程];

摘要：

This paper explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. In this study, we explore the space of possible subword units to determine the complexity of the subword units needed for SDR; describe the development and application of a phonetic recognition system to extract subword units from the speech signal; examine the behavior and sensitivity of the subword units to speech recognition errors; measure the effect of speech recognition performance on retrieval performance; and investigate a number of robust indexing and retrieval methods in an effort to improve retrieval performance in the presence of speech recognition errors. We find that with the appropriate subword units, it is possible to achieve performance comparable to that of text-based word units if the underlying phonetic units are recognized correctly. In the presence of speech recognition errors, retrieval performance degrades to 60% of the clean reference level. This performance can be improved by 23% (to 74% of the clean reference) with use of the robust methods. (C) 2000 Elsevier Science B.V. All rights reserved.

引用

页码：157 / 186

页数：30

共 52 条

[1]

ACERO A, 1990, P IEEE INT C AC SPEE, P849

[2]

[Anonymous], 1989, Automatic speech recognition: The development of the SPHINX system

[3]

BUCKLEY C, 1985, 85686 CORN U COMP SC

[4]

CHANG J, 1997, P EUR RHOD GREEC OCT, P1199

[5]

Chase L., 1997, P EUR C SPEECH COMM, P815

[6]

Chomsky Noam., 1968, The sound pattern of English

[7]

DAMASHEK, M .

SCIENCE, 1995, 267 (5199) :843-848

[8]

DELIGNE S, 1995, P ICASSP, P169

[9]

MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[10]

DHARANIPRAGADA S, 1998, 7 TEXT RETR C TREC 7

← 1 2 3 4 5 6 →