Hierarchical Cache Directory for CMP

被引:21
作者
Guo, Song-Liu [1 ]
Wang, Hai-Xia [2 ]
Xue, Yi-Bo [2 ]
Li, Chong-Min [1 ]
Wang, Dong-Sheng [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
cache coherence protocol; hierarchical directory; chip multiprocessor; ARCHITECTURE;
D O I
10.1007/s11390-010-9321-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As more processing cores are integrated into one chip and feature size continues to shrink, the average access latency for remote nodes using directory-based coherence protocol becomes higher, which greatly impacts system performance. Previous techniques such as, data replication and data migration optimize the performance of the requesting core, but offer little improvement for neighbor nodes. Other techniques such as in-transit optimization try to reduce latency at the cost of increased storage. This paper introduces hierarchical cache directory into CMP (chip multiprocessor), which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. A new directory organization is proposed to record the share status within a, region and assist the regional home to complete operation efficiently. Simulation results show that for a 16-core CMP, compared to traditional directory, hierarchical cache directory reduces average access latency by 9% and on-chip network traffic by 34% on average with less storage. Theoretical analyses show that for a 2(n) x 2(n) tiled CMP, the average access latency in hierarchical cache directory asymptotically approaches a function that is independent of n, hence the architecture is highly scalable.
引用
收藏
页码:246 / 256
页数:11
相关论文
共 21 条
[1]   A two-level directory architecture for highly scalable cc-NUMA multiprocessors [J].
Acacio, ME ;
González, J ;
García, JM ;
Duato, J .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2005, 16 (01) :67-79
[2]   An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration [J].
Acacio, ME ;
González, J ;
García, JM ;
Duato, J .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2004, 15 (08) :755-768
[3]   A new scalable directory architecture for large-scale multiprocessors [J].
Acacio, ME ;
González, J ;
García, JM ;
Duato, J .
HPCA: SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTING ARCHITECTURE, PROCEEDINGS, 2001, :97-106
[4]  
[Anonymous], P 36 ANN INT S COMP
[5]  
[Anonymous], 2005, SIGARCH Comput. Archit. News
[6]  
BECKMANN B, 2006, P 39 ANN IEEE ACM IN, P321
[7]  
Chang JC, 2006, CONF PROC INT SYMP C, P264, DOI 10.1145/1150019.1136509
[8]   Optimizing replication, communication, and capacity allocation in CMPs [J].
Chishti, Z ;
Powell, MD ;
Vijaykumar, TN .
32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2005, :357-368
[9]  
Eisley N, 2006, INT SYMP MICROARCH, P321
[10]   Leveraging On-Chip Networks for Data Cache Migration in Chip Multiprocessors [J].
Eisley, Noel ;
Peh, Li-Shiuan ;
Shang, Li .
PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :197-207