Memphis: Finding and Fixing NUMA-related Performance Problems on Multi-core Platforms

被引:53
作者
McCurdy, Collin [1 ]
Vetter, Jeffrey [1 ]
机构
[1] Oak Ridge Natl Lab, Future Technol Grp, Oak Ridge, TN 37830 USA
来源
2010 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2010) | 2010年
关键词
D O I
10.1109/ISPASS.2010.5452060
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Until recently, most high-end scientific applications have been immune to performance problems caused by Non-Uniform Memory Access (NUMA). However, current trends in micro-processor design are pushing NUMA to smaller and smaller scales. This paper examines the current state of NUMA and makes several contributions. First, we summarize the performance problems that NUMA can present for multi-threaded applications and describe methods of addressing them. Second, we demonstrate that NUMA can indeed be a significant problem for scientific applications, showing that it can mean the difference between an application scaling perfectly and failing to scale at all. Third, we describe, in increasing order of usefulness, three methods of using hardware performance counters to aid in finding NUMA-related problems. Finally, we introduce Memphis, a data-centric toolset that uses Instruction Based Sampling to help pinpoint problematic memory accesses, and demonstrate how we used it to improve the performance of several production-level codes - HYCOM, XGC1 and CAM - by 13%, 23% and 24% respectively.
引用
收藏
页码:87 / 96
页数:10
相关论文
共 22 条
[1]
Scaling to 150K cores: recent algorithm and performance engineering developments enabling XGC1 to run at scale [J].
Adams, Mark F. ;
Ku, Seung-Hoe ;
Worley, Patrick ;
D'Azevedo, Ed ;
Cummings, Julian C. ;
Chang, C-S .
SCIDAC 2009: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2009, 180
[2]
Adhianto L., 2010, CONCURRENCY COMPUTAT
[3]
Continuous profiling: Where have all the cycles gone? [J].
Anderson, JM ;
Berc, LM ;
Dean, J ;
Ghemawat, S ;
Henzinger, MR ;
Leung, STA ;
Sites, RL ;
Vandevoorde, MT ;
Waldspurger, CA ;
Weihl, WE .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1997, 15 (04) :357-390
[4]
[Anonymous], 2009, BIOS KERN DEV GUID B
[5]
[Anonymous], 1991, P 1991 ACM IEEE C SU
[6]
[Anonymous], 2009, SOFTW OPT GUID AMD F
[7]
An oceanic general circulation model framed in hybrid isopycnic-Cartesian coordinates [J].
Bleck, Rainer .
OCEAN MODELLING, 2002, 4 (01) :55-88
[8]
Casazza J., 2009, First the tick, now the tock: Intel microarchitecture (nehalem)
[9]
Collins WilliamD., 2006, J CLIMATE, V19
[10]
Dean J., 1997, P 30 ANN ACM IEEE IN