Relevance of time-frequency features for phonetic and speaker-channel classification

被引:69
作者
Yang, HH [1 ]
Van Vuuren, S [1 ]
Sharma, S [1 ]
Hermansky, H [1 ]
机构
[1] Oregon Grad Inst Sci & Technol, Dept Elect & Comp Engn, Beaverton, OR 97006 USA
基金
美国国家科学基金会;
关键词
mutual information; sources of variability; spectral feature; input selection; phonetic classification; multi-layer perceptron;
D O I
10.1016/S0167-6393(00)00007-8
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The mutual information concept is used to study the distribution of speech information in frequency and in time. The main focus is on the information that is relevant for phonetic classification. A large database of hand-labeled fluent speech is used to (a) compute the mutual information (MI) between a phonetic classification variable and one spectral feature variable in the time-frequency plane, and (b) compute the joint mutual information (JMI) between the phonetic classification variable and two feature variables in the time-frequency plane. The MI and the JMI of the feature variables are used as relevance measures to select inputs for phonetic classifiers. Multi-layer perceptron (MLP) classifiers with one or two inputs are trained to recognize phonemes to examine the effectiveness of the input selection method based on the MI and the JMI, To analyze the non-linguistic sources of variability, we use speaker-channel labels to represent different speakers and different telephone channels and estimate the MI between the speaker-channel variable and one or two feature variables. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:35 / 50
页数:16
相关论文
共 14 条
[1]  
[Anonymous], 1994, Modern applied statistics with S-Plus
[2]  
BARROWS G, 1996, IEEE INT S TIM FREQ, P249
[3]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[4]  
Bilmes JA, 1998, INT CONF ACOUST SPEE, P469, DOI 10.1109/ICASSP.1998.674469
[5]  
BONNLANDER BV, 1996, THESIS U COLORADO
[6]  
Cole R., 1994, ICSLP 94. 1994 International Conference on Spoken Language Processing, P1815
[7]  
COVER T, 1991, INFORMATION THEORY
[8]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[9]   Should recognizers have ears? [J].
Hermansky, H .
SPEECH COMMUNICATION, 1998, 25 (1-3) :3-27
[10]  
Morris A., 1993, Computer Speech and Language, V7, P121, DOI 10.1006/csla.1993.1006