Dialect/accent classification using unrestricted audio

被引:33
作者
Huang, Rongqing [1 ]
Hansen, John H. L. [1 ]
Angkititrakul, Pongtep [1 ]
机构
[1] Univ Texas, Dept Elect Engn, Ctr Robust Speech Syst, Richardson, TX 75083 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 02期
关键词
accent/dialect classification; AdaBoost algorithm; context adapted trianing; dialect dependency information; limited training data; robust acoustic modeling; word-based modeling;
D O I
10.1109/TASL.2006.881695
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study addresses novel advances in English dialect/accent classification. A word-based modeling technique is proposed that is shown to outperform a large vocabulary continuous speech recognition (LVCSR)-based system with significantly less computational costs. The new algorithm, which is named Word-based Dialect Classification (WDC), converts the text-independent decision problem into a text-dependent decision problem and produces multiple combination decisions at the word level rather than making a single decision at the utterance level. The basic WDC algorithm also provides options for further modeling and decision strategy improvement. Two sets of classifiers are employed for WDC: a word classifier D-W(k) and an utterance classifier D-u. D-W(k) is boosted via the AdaBoost algorithm directly in the probability space instead of the traditional feature space. D. is boosted via the dialect dependency information of the words. For a small training corpus, it is difficult to obtain a robust statistical model for each word and each dialect. Therefore, a context adapted training (CAT) algorithm is formulated, which adapts the universal phoneme Gaussian mixture models (GMMs) to dialect-dependent word hidden Markov models (HMMs) via linear regression. Three separate dialect corpora are used in the evaluations that include the Wall Street Journal (American and British English), NATO N4 (British, Canadian, Dutch, and German accent English), and IME (eight British dialects). Significant improvement in dialect classification is achieved for all corpora tested.
引用
收藏
页码:453 / 464
页数:12
相关论文
共 41 条
[1]  
ANGKITITRAKUL P, 2003, P INTERSPEECH 2003 E, P1353
[2]  
[Anonymous], 5 EUR C SPEECH COMM
[3]  
*CARN MELL U, CMU PRON DICT
[4]  
DIAKOLOUKAS V, 1997, P IEEE INT C AC SPEE, V2, P1455
[5]  
Dimitrakakis C, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS, P621
[6]  
FOO SW, 2003, P IEEE INT C AC SPEE, V2, P285
[7]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[8]   Mean and variance adaptation within the MLLR framework [J].
Gales, MJF ;
Woodland, PC .
COMPUTER SPEECH AND LANGUAGE, 1996, 10 (04) :249-264
[9]  
GREENBERG S, 1997, P ESCA WORKSH ROB SP, V1, P23
[10]  
GU Q, 2003, P IEEE INT C AC SPEE, V1, P36