Speaker Recognition Using Neural Networks and Conventional Classifiers

被引:116
作者
Farrell, Kevin R. [1 ]
Mammone, Richard J. [1 ]
Assaleh, Khaled T. [1 ]
机构
[1] Rutgers State Univ, CAIP Ctr, Piscataway, NJ 08854 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1994年 / 2卷 / 01期
关键词
Neural networks;
D O I
10.1109/89.260362
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
An evaluation of various classifiers for text-independent speaker recognition is presented. In addition, a new classifier is examined for this application. The new classifier is called the modified neural tree network (MNTN). The MNTN is a hierarchical classifier that combines the properties of decision trees and feedforward neural networks. The MNTN differs from the standard NTN in both the new learning rule used and the pruning criteria. The MNTN is evaluated for several speaker recognition experiments. These include closed-and open-set speaker identification and speaker verification. The database used is a subset of the TIMIT database consisting of 38 speakers from the same dialect region. The MNTN is compared with nearest neighbor classifiers, full-search, and tree-structured vector quantization (VQ) classifiers, multilayer perceptrons (MLP's), and decision trees. For closed-set speaker identification experiments, the full-search VQ classifier and MNTN demonstrate comparable performance. Both methods perform significantly better than the other classifiers for this task. The MNTN and full-search VQ classifiers are also compared for several speaker verification and open-set speaker-identification experiments. The MNTN is found to perform better than full-search VQ classifiers for both of these applications. In addition to matching or exceeding the performance of the VQ classifier for these applications, the MNTN also provides a logarithmic saving for retrieval.
引用
收藏
页码:194 / 205
页数:12
相关论文
共 60 条
[1]  
AGRAWAL A, 1992, P ANNIE ST LOUIS MO
[2]  
[Anonymous], 1992, SPRINGER INT
[3]  
ASSALEH K, 1993, P SPEECH RES S BALT
[4]   EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].
ATAL, BS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312
[5]   AUTOMATIC RECOGNITION OF SPEAKERS FROM THEIR VOICES [J].
ATAL, BS .
PROCEEDINGS OF THE IEEE, 1976, 64 (04) :460-475
[6]  
ATAL BS, 1968, THESIS POLYTECHNIC I
[7]   A PERFORMANCE COMPARISON OF TRAINED MULTILAYER PERCEPTRONS AND TRAINED CLASSIFICATION TREES [J].
ATLAS, L ;
COLE, R ;
MUTHUSAMY, Y ;
LIPPMAN, A ;
CONNOR, J ;
PARK, D ;
ELSHARKAWI, M ;
MARKS, RJ .
PROCEEDINGS OF THE IEEE, 1990, 78 (10) :1614-1619
[8]  
ATTILLI JB, 1988, P ICASSP
[9]  
BENNANI Y, 1991, P ICASSP, P385
[10]  
Bennani Y., 1990, P ICASSP, P265