Application of Deep Belief Networks for Natural Language Understanding

被引:381
作者
Sarikaya, Ruhi [1 ]
Hinton, Geoffrey E. [2 ]
Deoras, Anoop [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
关键词
Call-Routing; DBN; Deep Learning; Deep Neural Nets; Natural language Understanding; RBM;
D O I
10.1109/TASLP.2014.2303296
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Applications of Deep Belief Nets (DBN) to various problems have been the subject of a number of recent studies ranging from image classification and speech recognition to audio classification. In this study we apply DBNs to a natural language understanding problem. The recent surge of activity in this area was largely spurred by the development of a greedy layer-wise pretraining method that uses an efficient learning algorithm called Contrastive Divergence (CD). CD allows DBNs to learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation. We compare a DBN-initialized neural network to three widely used text classification algorithms: Support Vector Machines (SVM), boosting and Maximum Entropy (MaxEnt). The plain DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models. However, using additional unlabeled data for DBN pre-training and combining DBN-based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.
引用
收藏
页码:778 / 784
页数:7
相关论文
共 21 条
  • [1] [Anonymous], 2010003 UTML TR
  • [2] [Anonymous], 2001, P INT C MACH LEARN
  • [3] [Anonymous], 2006, P ICSLP
  • [4] A survey of smoothing techniques for ME models
    Chen, SF
    Rosenfeld, R
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01): : 37 - 50
  • [5] Dahl G.E., 2010, ADV NEURAL INFORM PR
  • [6] Inducing features of random fields
    DellaPietra, S
    DellaPietra, V
    Lafferty, J
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) : 380 - 393
  • [7] Erhan D, 2010, J MACH LEARN RES, V11, P625
  • [8] How may I help you?
    Gorin, AL
    Riccardi, G
    Wright, JH
    [J]. SPEECH COMMUNICATION, 1997, 23 (1-2) : 113 - 127
  • [9] Haffner P, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P632
  • [10] Training products of experts by minimizing contrastive divergence
    Hinton, GE
    [J]. NEURAL COMPUTATION, 2002, 14 (08) : 1771 - 1800