Template-based continuous speech recognition

被引:94
作者
De Wachter, Mathias [1 ]
Matton, Mike
Demuynck, Kris
Wambacq, Patrick
Cools, Ronald
Van Compernolle, Dirk
机构
[1] Katholieke Univ Leuven, Elect Engn Dept ESAT, Speech Proc Res Grp, B-3000 Louvain, Belgium
[2] Katholieke Univ Leuven, Dept Comp Sci, NINES Grp, B-3000 Louvain, Belgium
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 04期
关键词
dynamic time warping (DTW); episodic modeling; example-based recognition;
D O I
10.1109/TASL.2007.894524
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Despite their known weaknesses, hidden Markov models (HMMs) have been the dominant technique for acoustic modeling in speech recognition for over two decades. Still, the advances in the HMM framework have not solved its key problems: it discards information about time dependencies and is prone to overgeneralization. In this paper, we attempt to overcome these problems by relying on straightforward template matching. The basis for the recognizer is the well-known DTW algorithm. However, classical DTW continuous speech recognition results in an explosion of the search space. The traditional top-down search is therefore complemented with a data-driven selection of candidates for DTW alignment. We also extend the DTW framework with a flexible subword unit mechanism and a class sensitive distance measure-two components suggested by state-of-the-art HMM systems. The added flexibility of the unit selection in the template-based framework leads to new approaches to speaker and environment adaptation. The template matching system reaches a performance somewhat worse than the best published HMM results for the Resource Management benchmark, but thanks to complementarity of errors between the HMM and DTW systems, the combination of both leads to a decrease in word error rate with 17% compared to the HMM results.
引用
收藏
页码:1377 / 1390
页数:14
相关论文
共 34 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], CUEDFINFENGTR38
  • [3] Aradilla G, 2006, INT CONF ACOUST SPEE, P445
  • [4] ARYA S, 1993, PROCEEDINGS OF THE FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P271
  • [5] An optimal algorithm for approximate nearest neighbor searching in fixed dimensions
    Arya, S
    Mount, DM
    Netanyahu, NS
    Silverman, R
    Wu, AY
    [J]. JOURNAL OF THE ACM, 1998, 45 (06) : 891 - 923
  • [6] Axelrod S, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P173
  • [7] Bellman R. E., 1961, ADAPTIVE CONTROL PRO, DOI DOI 10.1515/9781400874668
  • [8] MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING
    BENTLEY, JL
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (09) : 509 - 517
  • [9] BILMES JA, 2001, JHU 2001 SUMM WORKSH
  • [10] Bisani M., 2004, 2004 IEEE INT C AC, P409