A review of speech-based bimodal recognition

被引：122

作者：

Chibelushi, CC ^{[1
]}

Deravi, F

Mason, JSD

机构：

[1] Staffordshire Univ, Sch Comp, Stafford ST18 0DG, Staffs, England

[2] Univ Kent, Elect Engn Lab, Canterbury CT2 7NT, Kent, England

[3] Univ Coll Swansea, Dept Elect & Elect Engn, Swansea SA2 8PP, W Glam, Wales

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2002年 / 4卷 / 01期

关键词：

audio-visual fusion; joint media processing; multimodal recognition; speaker recognition; speech recognition;

D O I：

10.1109/6046.985551

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech recognition and speaker recognition by machine are crucial ingredients for many important applications such as natural and flexible human-machine interfaces. Most developments in speech-based automatic recognition have relied on acoustic speech as the sole input signal, disregarding its visual counterpart. However, recognition based on acoustic speech alone can be afflicted with deficiencies that preclude its use in many real-world applications, particularly under adverse conditions. The combination of auditory and visual modalities promises higher recognition accuracy and robustness than can be obtained with a single modality. Multimodal recognition is therefore acknowledged as a vital component of the next generation of spoken language systems. This paper reviews the components of bimodal recognizers, discusses the accuracy of bimodal recognition, and highlights some outstanding research issues as, well as possible application domains.

引用

页码：23 / 37

页数：15

共 131 条

[1]

[Anonymous], 1987, Hearing by eye: The psychology of lip-reading

[2]

[Anonymous], P INT C SPOK LANG PR

[3]

[Anonymous], 1999, AVBPA

[4] EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].

ATAL, BS .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312

[5] AUTOMATIC SPEAKER RECOGNITION BASED ON PITCH CONTOURS [J].

ATAL, BS .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 52 (06) :1687-1697

[6]

AUCKENTHALER R, 1999, P 2 INT C AUD VIDE B, P142

[7]

BAHLER LG, 1994, P ACOUSTICS SPEECH S, V1, P321

[8] Fusion of face and speech data for person identity verification [J].

Ben-Yacoub, S ;

Abdeljaoued, Y ;

Mayoraz, E .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1065-1074

[9]

BENNANI Y, 1993, P INT C AC SPEECH SI, V1, P541

[10]

BENNANI Y, 1990, P IEEE ICASSP, V1, P265

← 1 2 3 4 5 6 7 8 9 10 →