Application of Deep Belief Networks for Natural Language Understanding

被引：381

作者：

Sarikaya, Ruhi ^{[1
]}

Hinton, Geoffrey E. ^{[2
]}

Deoras, Anoop ^{[1
]}

机构：

[1] Microsoft Corp, Redmond, WA 98052 USA

[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 04期

关键词：

Call-Routing; DBN; Deep Learning; Deep Neural Nets; Natural language Understanding; RBM;

D O I：

10.1109/TASLP.2014.2303296

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Applications of Deep Belief Nets (DBN) to various problems have been the subject of a number of recent studies ranging from image classification and speech recognition to audio classification. In this study we apply DBNs to a natural language understanding problem. The recent surge of activity in this area was largely spurred by the development of a greedy layer-wise pretraining method that uses an efficient learning algorithm called Contrastive Divergence (CD). CD allows DBNs to learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation. We compare a DBN-initialized neural network to three widely used text classification algorithms: Support Vector Machines (SVM), boosting and Maximum Entropy (MaxEnt). The plain DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models. However, using additional unlabeled data for DBN pre-training and combining DBN-based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.

引用

页码：778 / 784

页数：7

共 21 条

[1] [Anonymous], 2010003 UTML TR
[2] [Anonymous], 2001, P INT C MACH LEARN
[3] [Anonymous], 2006, P ICSLP
[4] A survey of smoothing techniques for ME models
Chen, SF
Rosenfeld, R
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01): : 37 - 50
[5] Dahl G.E., 2010, ADV NEURAL INFORM PR
[6] Inducing features of random fields
DellaPietra, S
DellaPietra, V
Lafferty, J
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) : 380 - 393
[7] Erhan D, 2010, J MACH LEARN RES, V11, P625
[8] How may I help you?
Gorin, AL
Riccardi, G
Wright, JH
[J]. SPEECH COMMUNICATION, 1997, 23 (1-2) : 113 - 127
[9] Haffner P, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P632
[10] Training products of experts by minimizing contrastive divergence
Hinton, GE
[J]. NEURAL COMPUTATION, 2002, 14 (08) : 1771 - 1800

← 1 2 3 →