Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

被引:1492
作者
Chen, Yu-Hsin [1 ]
Krishna, Tushar [1 ,2 ]
Emer, Joel S. [1 ,3 ]
Sze, Vivienne [1 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[3] Nvidia Corp, Westford, MA 01886 USA
关键词
Convolutional neural networks (CNNs); dataflow processing; deep learning; energy-efficient accelerators; spatial architecture; COPROCESSOR;
D O I
10.1109/JSSC.2016.2616357
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energyconsuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size N = 4), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW (N = 3).
引用
收藏
页码:127 / 138
页数:12
相关论文
共 36 条
[1]  
[Anonymous], 2016, END END LEARNING SEL
[2]  
[Anonymous], P IEEE C COMP VIS PA
[3]  
[Anonymous], PROC CVPR IEEE
[4]  
[Anonymous], 1000 CLASS IMAGE CLA
[5]  
[Anonymous], 2014, ACM INT C MULTIMEDIA
[6]  
Benini L., 2015, P 25 ED GREAT LAK S, P199, DOI 10.1145/2742060.2743766
[7]  
Chakradhar S, 2010, CONF PROC INT SYMP C, P247, DOI 10.1145/1816038.1815993
[8]   DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning [J].
Chen, Tianshi ;
Du, Zidong ;
Sun, Ninghui ;
Wang, Jia ;
Wu, Chengyong ;
Chen, Yunji ;
Temam, Olivier .
ACM SIGPLAN NOTICES, 2014, 49 (04) :269-283
[9]   Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Emer, Joel ;
Sze, Vivienne .
2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :367-379
[10]  
Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007