Resource allocation and utilization in the Blue Gene/L supercomputer

被引:22
作者
Aridor, Y
Domany, T
Goldshmidt, O
Moreira, JE
Shmueli, E
机构
[1] IBM Corp, Div Res, Haifa Res Lab, IL-31905 Haifa, Israel
[2] IBM Corp, Syst & Technol Grp, Rochester, MN 55901 USA
关键词
D O I
10.1147/rd.492.0425
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes partition allocation for parallel jobs in the Blue Gene (R)/L supercomputer. It describes the novel network architecture of the Blue Gene/L (BG/L) three-dimensional (3D) computational core and presents a preliminary analysis of its properties and advantages compared those of with more traditional systems. The scalability challenge is solved in BG/L by sacrificing granularity of system management. The system is treated as a collection of composite allocation units that contain both processing and communication resources. We discuss the ensuing algorithmic framework for computational and communication resource allocation and present results of simulations that explore resource utilization of BG/L for different workloads. We find that utilization depends strongly on both the predominant partition topology (mesh or torus) and the 3D shapes requested by the running jobs. When communication links are treated as (ledicated resources, it is much more difficult to allocate toroidal partitions than mesh ones, especially for jobs of more than one allocation unit in each dimension. We show that in these difficult cases, the advantage of BG/L compared with a 3D toroidal machine of the same size is very significant, with resource utilization better by a factor of 2. In the easier cases (e.g., preadominantly mesh partitions). there are no disadvantages. The advantage is primarily, due to the BG/L novel multi-toroidal topology that permits coallocation of multiple toroicial partitions at negligible additional cost.
引用
收藏
页码:425 / 436
页数:12
相关论文
共 16 条
[1]  
ADIGA NR, 2002, P ACM IEEE C SUP, P1
[2]  
ALMASI G, SYSTEM MANAGEMENT BL
[3]  
[Anonymous], 1993, CRAY T3D SYST ARCH O
[4]  
[Anonymous], P 38 IEEE COMP SOC I
[5]  
ARIDOR Y, 2004, P JOB SCH STRAT PAR, P72
[6]   Processor scheduling and allocation for 3D torus multicomputer systems [J].
Choo, H ;
Yoo, SM ;
Youn, HY .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2000, 11 (05) :475-484
[7]  
CHUANG PJ, 1991, 11TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, P256, DOI 10.1109/ICDCS.1991.148674
[8]  
Das Sharma D., 1993, Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing (Cat. No.93TH0584-3), P682, DOI 10.1109/SPDP.1993.395466
[9]  
DING JX, 1993, PROC INT CONF PARAL, P193
[10]  
Feitelson DG, 1997, LECT NOTES COMPUT SC, V1291, P238