Design and implementation of message-passing services for the Blue Gene/L supercomputer

被引：21

作者：

Almási, G

Archer, C

Castaños, JG

Gunnels, JA

Erway, CC

Heidelberger, P

Martorell, X

Moreira, JE

Pinnow, K

Ratterman, J

Steinmacher-Burow, BD

Gropp, W

Toonen, B

机构：

[1] IBM Corp, Div Res, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] IBM Corp, Syst & Technol Grp, Rochester, MN 55901 USA

[3] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA

[4] Tech Univ Catalonia, Barcelona 08034, Spain

[5] Argonne Natl Lab, Div Math & Comp Sci, Argonne, IL 60439 USA

来源：

IBM JOURNAL OF RESEARCH AND DEVELOPMENT | 2005年 / 49卷 / 2-3期

关键词：

D O I：

10.1147/rd.492.0393

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Blue Gene((R))/L (BG/L) supercomputer, with 65,536 dual-processor compute nodes, was designed from the ground lip to support efficient execution of massivelY parallel message-passing programs. Part of this support is an optimized implementation of the Message Passing Interface (MPI), which leverages the hardware features of BG/L. MPI for BG/L is implemented on top of a more basic message-passing infrastructure called the message layer. This message layer call be used both to implement other higher-level libraries and directly by applications. MPI and the message layer are used in the two BG/L modes of operation: the coprocessor mode and the virtual node mode. Performance measurements show that our message-passing services deliver performance close to the hardware limits of the machine. They also show that dedicating one of the processors of a node to communication functions (coprocessor mode) greatly improves the message-passing bandwidth, whereas running two processes per compute node (virtual node mode) can have a positive impact on application performance.

引用

页码：393 / 406

页数：14

共 23 条

[1] Blue Gene/L torus interconnection network [J].

Adiga, NR ;

Blumrich, MA ;

Chen, D ;

Coteus, P ;

Gara, A ;

Giampapa, ME ;

Heidelberger, P ;

Singh, S ;

Steinmacher-Burow, BD ;

Takken, T ;

Tsao, M ;

Vranas, P .

IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2005, 49 (2-3) :265-276

[2]

ADIGA NR, 2002, P ACM IEEE C SUP, P1

[3]

Almási G, 2003, LECT NOTES COMPUT SC, V2790, P543

[4]

ALMASI G, 2003, P 10 EUR PVM MPI US, P252

[5]

ALMASI GS, 2002, P IEEE INT SOL STAT, V2, P152

[6]

Bailey David, 1995, Technical report, Technical Report NAS-95-020

[7] THE NAS PARALLEL BENCHMARKS [J].

BAILEY, DH ;

BARSZCZ, E ;

BARTON, JT ;

BROWNING, DS ;

CARTER, RL ;

DAGUM, L ;

FATOOHI, RA ;

FREDERICKSON, PO ;

LASINSKI, TA ;

SCHREIBER, RS ;

SIMON, HD ;

VENKATAKRISHNAN, V ;

WEERATUNGA, SK .

INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1991, 5 (03) :63-73

[8] MPI-LAPl: An efficient implementation of MPI for IBM RS/6000 SP systems [J].

Banikazemi, M ;

Govindaraju, RK ;

Blackmore, R ;

Panda, DK .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, 12 (10) :1081-1093

[9]

Brightwell R., 1996, Proceedings. Second MPI Developer's Conference, P18, DOI 10.1109/MPIDC.1996.534090

[10]

BRUCK J, 1994, P 6 ANN ACM S PAR AL, P298

← 1 2 3 →