Data Augmentation for Deep Neural Network Acoustic Modeling

被引：250

作者：

Cui, Xiaodong ^{[1
]}

Goel, Vaibhava ^{[1
]}

Kingsbury, Brian ^{[1
]}

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2015年 / 23卷 / 09期

关键词：

Data augmentation; stochastic feature mapping; deep neural networks; automatic speech recognition; keyword search;

D O I：

10.1109/TASLP.2015.2438544

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper investigates data augmentation for deep neural network acoustic modeling based on label-preserving transformations to deal with data sparsity. Two data augmentation approaches, vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM), are investigated for both deep neural networks (DNNs) and convolutional neural networks (CNNs). The approaches are focused on increasing speaker and speech variations of the limited training data such that the acoustic models trained with the augmented data are more robust to such variations. In addition, a two-stage data augmentation scheme based on a stacked architecture is proposed to combine VTLP and SFM as complementary approaches. Experiments are conducted on Assamese and Haitian Creole, two development languages of the IARPA Babel program, and improved performance on automatic speech recognition (ASR) and keyword search (KWS) is reported.

引用

页码：1469 / 1477

页数：9

共 40 条

[1] Convolutional Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Mohamed, Abdel-Rahman
Jiang, Hui
Deng, Li
Penn, Gerald
Yu, Dong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
[2] Abe M., 1998, P INT C AC SPEECH SI, P655
[3] [Anonymous], 1989, DISTANCE MEASURES SP
[4] [Anonymous], 2011, INTERSPEECH
[5] [Anonymous], P INTERSPEECH
[6] Cui J, 2013, INT CONF ACOUST SPEE, P6753, DOI 10.1109/ICASSP.2013.6638969
[7] Cui X., 2013, P INTERSPEECH
[8] Cui XD, 2015, INT CONF ACOUST SPEE, P4545, DOI 10.1109/ICASSP.2015.7178831
[9] Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages
Cui, Xiaodong
Xue, Jian
Chen, Xin
Olsen, Peder A.
Dognin, Pierre L.
Chaudhari, Upendra V.
Hershey, John R.
Zhou, Bowen
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (08): : 2252 - 2264
[10] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
Dahl, George E.
Yu, Dong
Deng, Li
Acero, Alex
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42

← 1 2 3 4 →