FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

被引:215
作者
Guan, Yijin [1 ,3 ]
Liang, Hao [2 ,3 ]
Xu, Ningyi [3 ]
Wang, Wenqiang [3 ]
Shi, Shaoshuai [3 ]
Chen, Xi [3 ]
Sun, Guangyu [1 ,5 ]
Zhang, Wei [2 ]
Cong, Jason [1 ,4 ,5 ,6 ,7 ]
机构
[1] Peking Univ, Ctr Energy Efficient Comp & Applicat, Beijing, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Hong Kong, Peoples R China
[3] Microsoft Res Asia, Beijing, Peoples R China
[4] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[5] PKU UCLA Joint Res Inst Sci & Engn, Los Angeles, CA USA
[6] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[7] Peking Univ, Beijing, Peoples R China
来源
2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017) | 2017年
关键词
D O I
10.1109/FCCM.2017.25
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
DNNs (Deep Neural Networks) have demonstrated great success in numerous applications such as image classification, speech recognition, video analysis, etc. However, DNNs are much more computation-intensive and memory-intensive than previous shallow models. Thus, it is challenging to deploy DNNs in both large-scale data centers and real-time embedded systems. Considering performance, flexibility, and energy efficiency, FPGA-based accelerator for DNNs is a promising solution. Unfortunately, conventional accelerator design flows make it difficult for FPGA developers to keep up with the fast pace of innovations in DNNs. To overcome this problem, we propose FP-DNN (Field Programmable DNN), an end-to-end framework that takes TensorFlow-described DNNs as input, and automatically generates the hardware implementations on FPGA boards with RTL-HLS hybrid templates. FP-DNN performs model inference of DNNs with our high-performance computation engine and carefully-designed communication optimization strategies. We implement CNNs, LSTM-RNNs, and Residual Nets with FP-DNN, and experimental results show the great performance and flexibility provided by our proposed FP-DNN framework.
引用
收藏
页码:152 / 159
页数:8
相关论文
共 24 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2016, ACM SIGDA INT S FIEL
[3]  
[Anonymous], 2015, P IEEE C COMP VIS PA
[4]  
[Anonymous], 2015, ARXIV150304069
[5]  
[Anonymous], NIPS 2012 WORKSH
[6]  
[Anonymous], 2014, P ACM INT C MULT
[7]  
[Anonymous], 2016, ACM SIGDA INT S FIEL
[8]  
[Anonymous], HIGH PERFORMANCE COM
[9]  
[Anonymous], IEEE ACM T AUDIO SPE
[10]  
[Anonymous], 2015, CORR