Phonetic searching vs. LVCSR: How to find what you really want in audio archives

被引:16
作者
Cardillo P.S. [1 ]
Clements M. [1 ]
Miller M.S. [1 ]
机构
[1] Fast-Talk Communications Inc., Atlanta, GA
关键词
Digital media asset management (DMAM); Large vocabulary continuous speech recognition (LVCSR); Phonetic searching;
D O I
10.1023/A:1013670312989
中图分类号
学科分类号
摘要
A new technique is presented for searching digital audio at the word/phrase level. Unlike previous methods based upon Large Vocabulary Continuous Speech Recognition (LVCSR, with inherent problems of closed vocabulary and high word error rate), phonetic searching combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment. A detailed comparison of accuracy between phonetic searching and one popular embodiment of LVCSR is presented along with other operating characteristics of the new technique. The current implementation for Digital Media Asset Management (DMAM) is described along with suggested applications in other domains.
引用
收藏
页码:9 / 22
页数:13
相关论文
共 13 条
[1]  
Chang E.I., Lippmann R.P., Improving wordspotting performance with artificially generated data, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 283-286, (1996)
[2]  
Choi J., Hindle D., Hirshberg J., Magrin-Chagnolleau I., Kakatani C., Pereira F., Singhal A., Whittaker S., SCAN - Speech content based audio navigator: A systems overview, Proceedings International Conference on Spoken Language Processing, (1998)
[3]  
Clements M., Cardillo P., Miller M., Phonetic searching of digital audio, Broadcast Engineering Conference Proceedings, pp. 131-140, (2001)
[4]  
Garofolo J., Auzanne C., Voorhees E., The TREC spoken document retrieval track: A success story, Proceedings of TREC-8, pp. 107-116, (1999)
[5]  
Graff D., Wu Z., McIntyre R., Liberman M., The 1996 broadcast news speech and language-model corpus, Proceedings of the 1997 DARPA Speech Recognition Workshop, (1997)
[6]  
Huang X., Acero A., Alleva F., Hwang M., Jiang L., Mahajan M., Microsoft windows highly intelligent speech recognizer: WHISPER, Proceedings of ICASSP 95, pp. 93-97, (1995)
[7]  
James D.A., Young S.J., A fast lattice-based approach to vocabulary independent wordspotting, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 377-380, (1994)
[8]  
Johnson S.E., Woodland P.C., Jourlin P., Spark Jones K., Spoken document retrieval for TREC-8 at Cambridge University, Proceedings of TREC-8, pp. 197-206, (1999)
[9]  
Jurafsky D., Martin J., Speech and Language Processing, (2000)
[10]  
Ng K., Zue V., Phonetic recognition for spoken document retrieval, Proceedings of ICASSP 98, (1998)