Convolutional feature learning and Hybrid CNN-HMM for scene number recognition

被引:41
作者
Guo, Qiang [1 ]
Wang, Fenglei [1 ]
Lei, Jun [1 ]
Tu, Dan [1 ]
Li, Guohui [1 ]
机构
[1] Natl Univ Def Technol, Coll Informat Syst & Management, Changsha 410072, Hunan, Peoples R China
关键词
Scene text recognition; Convolutional neural network; Hidden Markov model; Deep learning; Hybrid NN-HMM; GMM-HMM; NEURAL-NETWORKS;
D O I
10.1016/j.neucom.2015.07.135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we investigate to recognize house numbers captured in street view images. We formulate the problem as sequence recognition and present an integrated model by combining Convolutional Neural Network (CNN) and Hidden Markov Model (HMM). Our method utilizes representation capability of CNN to model the highly variable appearance of digits. Meanwhile, HMM is used to handle the dynamics of the image sequence. They are combined in a hybrid way to form the Hybrid CNN-HMM. Using this model, we can perform training and recognition both at the whole image level without explicit segmentation. The model makes CNN applicable to dynamic problems. Experiments show that the Hybrid CNN-HMM can dramatically boost the performance of Gaussian Mixture Model (GMM)-HMM. We evaluate different local features, e.g. LBP, SIFT and HOG, as observations fed into HMM and find CNN features consistently surpass those hand-engineered features with respect to recognition accuracy. To gain insight into performance difference of the features, we map them from the high-dimensional space to a 2-D plane by the t-SNE algorithm to visualize their semantic clustering with respect to the task. The visualization clearly justified the efficiency of features learnt by CNN. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:78 / 90
页数:13
相关论文
共 57 条
[21]  
Kapadia S., 1993, ICASSP-93. 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (Cat. No.92CH3252-4), P491, DOI 10.1109/ICASSP.1993.319349
[22]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[23]  
Lafferty John, 2001, INT C MACH LEARN ICM
[24]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324
[25]   Deep Learning for Acoustic Modeling in Parametric Speech Generation [J].
Ling, Zhen-Hua ;
Kang, Shi-Yin ;
Zen, Heiga ;
Senior, Andrew ;
Schuster, Mike ;
Qian, Xiao-Jun ;
Meng, Helen ;
Deng, Li .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (03) :35-52
[26]   SIFT Flow: Dense Correspondence across Scenes and Its Applications [J].
Liu, Ce ;
Yuen, Jenny ;
Torralba, Antonio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (05) :978-994
[27]   Texture Classification from Random Features [J].
Liu, Li ;
Fieguth, Paul W. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (03) :574-586
[28]  
Lowe D., 1999, P 7 IEEE INT C COMP, V2, P1150, DOI [10.1109/ICCV.1999.790410, DOI 10.1109/ICCV.1999.790410]
[29]  
Maas A. L., ABS14067806 CORR
[30]  
Matan CJ., 1991, ADV NEURAL INFORM PR, V4, P488