Cache Miss Equations: A compiler framework for analyzing and tuning memory behavior

被引:127
作者
Ghosh, S [1 ]
Martonosi, M [1 ]
Malik, S [1 ]
机构
[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
来源
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS | 1999年 / 21卷 / 04期
关键词
design; experimentation; performance; Cache memories; compilation; optimization; program transformation;
D O I
10.1145/325478.325479
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the ever-widening performance gap between processors and main memory, cache memory, which is used to bridge this gap, is becoming more and more significant. Caches work well for programs that exhibit sufficient locality. Other programs, however, have reference patterns that fail to exploit the cache, thereby suffering heavily from high memory latency. In order to get high cache efficiency and achieve good program performance, efficient memory accessing behavior is necessary. In fact, for many programs, program transformations or source-code changes can radically alter memory access patterns, significantly improving cache performance. Both hand-tuning and compiler optimization techniques are often used to transform codes to improve cache utilization. Unfortunately, cache conflicts are difficult to predict and estimate, precluding effective transformations. Hence, effective transformations require detailed knowledge about the frequency and causes of cache misses in the code. This article describes methods for generating and solving Cache Miss Equations (CMEs) that give a detailed representation of cache behavior, including conflict misses, in loop-oriented scientific code. Implemented within the SUIF compiler frame-work, our approach extends traditional compiler reuse analysis to generate linear Diophantine equations that summarize each loop's memory behavior. While solving these equations is in general difficult, we show that is also unnecessary, as mathematical techniques for manipulating Diophantine equations allow us to relatively easily compute and/or reduce the number of possible solutions, where each solution corresponds to a potential cache miss. The mathematical precision of CMEs allows us to find true optimal solutions for transformations such as blocking or padding. The generality of CMEs also allows us to reason about interactions between transformations applied in concert. The article also gives examples of their use to determine array padding and offset amounts that minimize cache misses, and to determine optimal blocking factors for tiled code. Overall, these equations represent an analysis framework that offers the generality and precision needed for detailed compiler optimizations.
引用
收藏
页码:703 / 746
页数:44
相关论文
共 37 条
[1]  
ADLER A., 1995, The theory of numbers: A text and source book of problems
[2]   AUTOMATIC TRANSLATION OF FORTRAN PROGRAMS TO VECTOR FORM [J].
ALLEN, R ;
KENNEDY, K .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1987, 9 (04) :491-542
[3]  
[Anonymous], PLDI 1994
[4]  
BACON D, 1994, P IBM CTR ADV STUD C
[5]  
BAILEY D, 1992, RNR92015 NASA AM RES
[6]  
Banerjee U., 1993, LOOP TRANSFORMATIONS
[7]  
CARR S, 1992, P SUP 92 C
[8]  
CARR S, 1995, P 8 SIAM C PAR PROC
[9]  
CLAUSS P, 1996, P 1996 INT C SUP
[10]  
COLEMAN S, 1995, P SIGPLAN 95 C PROGR