Convolutional feature learning and Hybrid CNN-HMM for scene number recognition

被引:41
作者
Guo, Qiang [1 ]
Wang, Fenglei [1 ]
Lei, Jun [1 ]
Tu, Dan [1 ]
Li, Guohui [1 ]
机构
[1] Natl Univ Def Technol, Coll Informat Syst & Management, Changsha 410072, Hunan, Peoples R China
关键词
Scene text recognition; Convolutional neural network; Hidden Markov model; Deep learning; Hybrid NN-HMM; GMM-HMM; NEURAL-NETWORKS;
D O I
10.1016/j.neucom.2015.07.135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we investigate to recognize house numbers captured in street view images. We formulate the problem as sequence recognition and present an integrated model by combining Convolutional Neural Network (CNN) and Hidden Markov Model (HMM). Our method utilizes representation capability of CNN to model the highly variable appearance of digits. Meanwhile, HMM is used to handle the dynamics of the image sequence. They are combined in a hybrid way to form the Hybrid CNN-HMM. Using this model, we can perform training and recognition both at the whole image level without explicit segmentation. The model makes CNN applicable to dynamic problems. Experiments show that the Hybrid CNN-HMM can dramatically boost the performance of Gaussian Mixture Model (GMM)-HMM. We evaluate different local features, e.g. LBP, SIFT and HOG, as observations fed into HMM and find CNN features consistently surpass those hand-engineered features with respect to recognition accuracy. To gain insight into performance difference of the features, we map them from the high-dimensional space to a 2-D plane by the t-SNE algorithm to visualize their semantic clustering with respect to the task. The visualization clearly justified the efficiency of features learnt by CNN. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:78 / 90
页数:13
相关论文
共 57 条
[1]  
[Anonymous], ARXIV13024389
[2]  
[Anonymous], 2014, Part-based R-CNNs for fine-grained category detection. Paper presented at: European Conference on Computer Vision, Zurich
[3]  
[Anonymous], ARXIV13101811
[4]  
[Anonymous], ARXIV13126082
[5]  
[Anonymous], 2008, VLFeat: An open and portable library of computer vision algorithms
[6]  
Baum L. E., INEQUALITIES, V3
[7]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[8]   Ten Years of Pedestrian Detection, What Have We Learned? [J].
Benenson, Rodrigo ;
Omran, Mohamed ;
Hosang, Jan ;
Schiele, Bernt .
COMPUTER VISION - ECCV 2014 WORKSHOPS, PT II, 2015, 8926 :613-627
[9]   PhotoOCR: Reading Text in Uncontrolled Conditions [J].
Bissacco, Alessandro ;
Cummins, Mark ;
Netzer, Yuval ;
Neven, Hartmut .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :785-792
[10]  
Bourlard H.A., 1993, Connectionist Speech Recognition: A Hybrid Approach, DOI 10.1007/978-1-4615-3210-1