Hierarchical loop scheduling for clustered NUMA machines

被引:4
作者
Wang, YM [1 ]
Wang, HH
Chang, RC
机构
[1] Providence Univ, Dept Comp Sci & Informat Management, Taichung 433, Taiwan
[2] Natl Chiao Tung Univ, Dept Comp & Informat Sci, Hsinchu 30050, Taiwan
关键词
D O I
10.1016/S0164-1212(00)00045-5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Loop scheduling is an important issue in the development of high performance multiprocessors. As modern multiprocessors have high and non-uniform memory access (NUMA) costs, the communication costs dominate the execution of parallel programs. Previous affinity algorithms perform better than dynamic algorithms under non-clustered NUMA multiprocessors, but they suffer heavy overheads when migrating work load under clustered NUMA machines. In this paper, we propose a new loop scheduling policy, hierarchical policy, to improve Various affinity scheduling algorithms (AFSs) for clustered NUMA machines. We cyclically distribute the iteration chunks to clusters. When imbalance occurs, the migration of iterations is carried on hierarchically. We use hierarchical policy to improve AFS and modified AFS (MAFS), and we call them Hierarchical AFS (HAFS) and Hierarchical MAFS (HMAFS), respectively. AFS uses a deterministic assignment policy to assign repeated executions of loop iteration to the same processor. MAFS modifies the migration policy of AFS, and reduces the number of synchronization operations. We confirm our idea by running many applications under a clustered NUMA simulator. Our experimental result shows that hierarchical policy reduces the inter-cluster remote memory accesses, decreases the locks to the queues, and effectively balances the work load. We also show that HMAFS is the best choice among these algorithms in most cases. (C) 2000 Elsevier Science Inc. All rights reserved.
引用
收藏
页码:33 / 44
页数:12
相关论文
共 19 条
[1]  
AGARWAL A, 1995, ACM COMP AR, P2, DOI 10.1109/ISCA.1995.524544
[2]  
Crovella M., 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing (Cat. No.91TH0396-2), P590, DOI 10.1109/SPDP.1991.218246
[3]  
Hennessy JL., 1990, COMPUTER ARCHITECTUR
[4]   FACTORING - A METHOD FOR SCHEDULING PARALLEL LOOPS [J].
HUMMEL, SF ;
SCHONBERG, E ;
FLYNN, LE .
COMMUNICATIONS OF THE ACM, 1992, 35 (08) :90-101
[5]   ALLOCATING INDEPENDENT SUBTASKS ON PARALLEL PROCESSORS [J].
KRUSKAL, CP ;
WEISS, A .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1985, 11 (10) :1001-1016
[6]  
LENOSKI D, 1992, ACM COMP AR, V20, P92, DOI 10.1145/146628.139706
[7]  
Li H., 1993, INT C PAR PROC, P140
[8]   USING PROCESSOR AFFINITY IN LOOP SCHEDULING ON SHARED-MEMORY MULTIPROCESSORS [J].
MARKATOS, EP ;
LEBLANC, TJ .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1994, 5 (04) :379-400
[9]  
MARKATOS EP, 1992, 420 U ROCH COMP SCI
[10]  
MARKATOS EP, 1993, THESIS U ROCHESTER