The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification

被引:25
作者
Patil, Sanjay A. [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Dept Elect Engn, CRSS, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
关键词
Physiological sensor; Stress detection; Speaker verification; Non-acoustic sensor; PMIC; AUTOMATIC SPEECH RECOGNITION; HEART-RATE; CLASSIFICATION; NOISE; COMPENSATION;
D O I
10.1016/j.specom.2009.11.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Interactive speech system scenarios exist which require the user to perform tasks which exert limitations on speech production, thereby causing speaker variability and reduced speech performance. In noisy stressful scenarios, even if noise could be completely eliminated, the production variability brought on by stress, including Lombard effect, has a more pronounced impact on speech system performance. Thus, in this study we focus on the use of a silent speech interface (PMIC), with a corresponding experimental assessment to illustrate its utility in the tasks of stress detection and speaker verification. This study focuses on the suitability of PMIC versus close-talk microphone (CTM), and reports that the PMIC achieves as good performance as CTM or better for a number of test conditions. PMIC reflects both stress-related information and speaker-dependent information to a far greater extent than the CTM. For stress detection performance (which is reported in % accuracy), PMIC performs at least on par or about 2% better than the CTM-based system. For a speaker verification application, the PMIC outperforms CTM for all matched stress conditions. The performance reported in terms of %EER is 0.91% (as compared to 1.69%), 0.45% (as compared to 1.49%), and 1.42% (as compared to 1.80%) for PMIC. This indicates that PMIC reflects speaker-dependent information. Also, another advantage of the PMIC is its ability to record the user physiology traits/state. Our experiments illustrate that PMIC can be an attractive alternative for stress detection as well as speaker verification tasks along with an advantage of its ability to record physiological information, in situations where the use of CTM may hinder operations (deep sea divers, fire-fighters in rescue operations, etc.). (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:327 / 340
页数:14
相关论文
共 56 条
[41]   Wearable data acquisition for heartbeat and respiratory information using NAM (Non-Audible Murmur) microphone [J].
Noma, H ;
Kogure, K ;
Nakajima, Y ;
Shimonomura, H ;
Ohsuga, M .
NINTH IEEE INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, PROCEEDINGS, 2005, :210-211
[42]   THE IMAGE-INPUT-MICROPHONE - A NEW NONACOUSTIC SPEECH-COMMUNICATION SYSTEM BY MEDIA CONVERSION FROM ORAL MOTION IMAGES TO SPEECH [J].
OTANI, K ;
HASEGAWA, T .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1995, 13 (01) :42-48
[43]  
PETERS RD, 1995, COMP MED SY, P204, DOI 10.1109/CBMS.1995.465427
[44]   Exploiting nonacoustic sensors for speech encodinog [J].
Quatieri, TF ;
Brady, K ;
Messing, D ;
Campbell, JP ;
Campbell, WM ;
Brandstein, MS ;
Weinstein, CJ ;
Tardelli, JD ;
Gatewood, PD .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02) :533-544
[45]   SPEAKER IDENTIFICATION AND VERIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS [J].
REYNOLDS, DA .
SPEECH COMMUNICATION, 1995, 17 (1-2) :91-108
[46]  
Roucos S., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P737
[47]  
SCANLON M, 2002, MULT SPEECH REC WORK, P1
[48]   Language identification in noisy environments using throat microphone signals [J].
Shahina, A ;
Yegnanarayana, B .
2005 INTERNATIONAL CONFERENCE ON INTELLIGENT SENSING AND INFORMATION PROCESSING, PROCEEDINGS, 2005, :400-403
[49]   Comparison between electroglottography and electromagnetic glottography [J].
Titze, IR ;
Story, BH ;
Burnett, GC ;
Holzrichter, JF ;
Ng, LC ;
Lea, WA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 107 (01) :581-588
[50]  
Tran VA, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P1465