Vocabulary expansion through automatic abbreviation generation for Chinese voice search

被引:12
作者
Yang, Dong [1 ]
Pan, Yi-Cheng [1 ]
Furui, Sadaoki [1 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Ookayama, Tokyo 1528552, Japan
关键词
Automatic abbreviation generation; Vocabulary expansion; Voice search;
D O I
10.1016/j.csl.2011.12.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Long organization names are often abbreviated in spoken Chinese, and abbreviated utterances cannot be recognized correctly if the abbreviations are not included in the recognition vocabulary. Therefore, it is very important to automatically generate and add abbreviations for organization names to the vocabulary. Generation of Chinese abbreviation:; is much more complex than English abbreviations which are mostly acronyms and truncations. In this paper, we propose a new hybrid method for automatically generating Chinese abbreviations and we perform vocabulary expansion using output of the abbreviation model for voice search. In our abbreviation modeling, we treat the abbreviation generation problem as a tagging problem and use conditional random fields (CRF) as the tagging tool, the output of which is then re-ranked by a length model and web information. In the vocabulary expansion, considering the multiple abbreviation phenomenon and limited coverage of the top-1 abbreviation candidate, we add top-10 candidates into the vocabulary. In our experiments, for the abbreviation modeling, we achieved a top-10 coverage of 88.3% with the proposed method. For the voice search using abbreviated utterances, we improved the full-name search accuracy from 16.9% to 79.2% by incorporating the top-10 abbreviation candidates to the vocabulary. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:321 / 335
页数:15
相关论文
共 20 条
  • [1] [Anonymous], 2002, THESIS U EDINBURGH
  • [2] [Anonymous], 2001, PROC 18 INT C MACH L
  • [3] Deploying GOOG-411: Early lessons in data, measurement, and testing
    Bacchiani, Michiel
    Beaufays, Francoise
    Schalkwyk, Johan
    Schuster, Mike
    Strope, Brian
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5260 - 5263
  • [4] Chang Jing-Shin, 2004, P 3 SIGHAN WORKSH CH, P9
  • [5] CHANG JS, 2006, P ANN M ASS COMP LIN, P17
  • [6] Fu GH, 2006, LECT NOTES COMPUT SC, V4182, P530
  • [7] Jan E., 2003, P EUR, P1249
  • [8] Lee A., 2009, AS PAC SIGN INF PROC
  • [9] Li X, 2008, INT CONF ACOUST SPEE, P4913
  • [10] Li Z., 2008, ACL, P425