Distributed speech processing in MiPad's multimodal user interface

被引：23

作者：

Deng, L ^{[1
]}

Wang, KS

Acero, A

Hon, HW

Droppo, J

Boulis, C

Wang, YY

Jacoby, D

Mahajan, M

Chelba, C

Huang, XD

机构：

[1] Microsoft Corp, Res, Redmond, WA 98052 USA

[2] Univ Washington, Seattle, WA 98195 USA

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2002年 / 10卷 / 08期

基金：

巴西圣保罗研究基金会; 美国国家科学基金会;

关键词：

client-server computing; distributed speech recognition; error protection; mobile computing; noise robustness; speech-enabled applications; speech feature compression; speech processing systems;

D O I：

10.1109/TSA.2002.804538

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes the main components of MiPad (Multimodal Interactive PAD) and especially its distributed speech processing aspects. MiPad is a wireless mobile PDA prototype that enables users to accomplish many common tasks using a multi-modal spoken language interface and wireless-data technologies. It fully integrates continuous speech recognition and spoken language understanding, and provides a novel solution for data entry in PDAs or smart phones, often done by pecking with tiny styluses or typing on minuscule keyboards. Our user study indicates that the throughput of MiPad is significantly superior to that of the existing pen-based PDA interface. Acoustic modeling and noise robustness in distributed speech recognition are key components in MiPad's design and implementation. In a typical scenario, the user speaks to the device at a distance so that he or she can see the screen. The built-in microphone thus picks up a lot of background noise, which requires MiPad be noise robust. For complex tasks, such as dictating e-mails, resource limitations demand the use of a client-server. (peer-to-peer) architecture, where the PDA performs primitive feature extraction, feature quantization, and error protection, while the transmitted features to the server are subject to further speech feature enhancement, speech decoding and understanding before a dialog is carried out and actions rendered. Noise robustness can be achieved at the client, at the server or both. Various speech processing aspects of this type of distributed computation as related to MiPad's potential deployment are presented in this paper. Recent user interface study results are also described. Finally, we point out future research directions as related to several key MiPad functionalities.

引用

页码：605 / 619

页数：15

共 28 条

[1]

ACERO A, 1991, P INT C AC SPEECH SI

[2]

AFIFY M, 2001, P EUR C AALB DENM SE

[3]

[Anonymous], P INT C AC SPEECH SI

[4]

COMERFORD L, 2001, P INT C AC SPEECH SI, V1

[5]

DENG L, 2001, UNPUB IEEE T SPEECH

[6]

Deng L., 2002, P INT C SPOK LANG PR

[7]

DENG L, 2001, P INT C AC SPEECH SI

[8]

Deng L., 2000, P ANN C INT SPEECH C, P806

[9]

DENG L, 2001, P AUT SPEECH REC UND

[10]

DEVETH, 2001, P EUR C AALB DENM SE

← 1 2 3 →