A support vector machine-based dynamic network for visual speech recognition applications

被引:19
作者
Gordan, M [1 ]
Kotropoulos, C [1 ]
Pitas, I [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, GR-54006 Thessaloniki, Greece
关键词
visual speech recognition; mouth shape recognition; visemes; phonemes; support vector machines; Viterbi lattice;
D O I
10.1155/S1110865702207039
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Visual speech recognition is an emerging research field. In this paper, we examine the suitability of support vector machines for visual speech recognition. Each word is modeled as a temporal sequence of visemes corresponding to the different phones realized. One support vector machine is trained to recognize each viseme and its output is converted to a posterior probability through a sigmoidal mapping. To model the temporal character of speech, the support vector machines are integrated as nodes into a Viterbi lattice. We test the performance of the proposed approach on a small visual speech recognition task, namely the recognition of the first four digits in English. The word recognition rate obtained is at the level of the previous best reported rates.
引用
收藏
页码:1248 / 1259
页数:12
相关论文
共 31 条
[1]  
Benoit C., 1992, SET FRENCH VISEMES V
[2]  
BREGLER C, 1995, FIFTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, PROCEEDINGS, P494, DOI 10.1109/ICCV.1995.466899
[3]  
Buciu I, 2001, IEEE IMAGE PROC, P1054, DOI 10.1109/ICIP.2001.959230
[4]  
*CARN MELL U, PRON DICT 5
[5]   Audio-visual integration in multimodal communication [J].
Chen, T ;
Rao, RR .
PROCEEDINGS OF THE IEEE, 1998, 86 (05) :837-852
[6]  
Chen TH, 2001, IEEE SIGNAL PROC MAG, V18, P9
[7]  
Cristianini N, 2000, Intelligent Data Analysis: An Introduction
[8]  
Deller Jr J. R., 1993, DISCRETE TIME PROCES
[9]  
FAZEKAS A, 2001, P 2 IEEE S IM SIGN P, P43
[10]  
Ganapathiraju A., 2000, P SPEECH TRANSCR WOR