MPI-LAPl: An efficient implementation of MPI for IBM RS/6000 SP systems

被引：21

作者：

Banikazemi, M ^{[1
]}

Govindaraju, RK

Blackmore, R

Panda, DK

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] IBM Corp, Commun Subsyst, Power Parallel Syst, Poughkeepsie, NY 12601 USA

[3] Ohio State Univ, Dept Comp & Informat Sci, Columbus, OH 43210 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2001年 / 12卷 / 10期

基金：

美国国家科学基金会;

关键词：

interprocessor communication; fast messaging layers; networks of workstations; Message Passing Interface (MPI); clustering;

D O I：

10.1109/71.963419

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The IBM RS/6000 SP system is one of the most cost-effective commercially available high performance machines. IBM RS/6000 SP systems support the Message Passing Interface standard (MPI) and LAPI. LAPI is a low level, reliable, and efficient one-sided communication API library implemented on IBM IRS/6000 SP systems. This paper explains how the high performance of the LAPI library has been exploited in order to implement the MPI standard more efficiently than the existing MPI. It describes how to avoid unnecessary data copies at both the sending and receiving sides for such an implementation. The resolution of problems arising from the mismatches between the requirements of the MPI standard and the features of LAPI is discussed. As a result of this exercise, certain enhancements to LAPI are identified to enable an efficient implementation of MPI on LAPI. The performance of the new implementation of MPI is compared with that of the underlying LAPI itself. The latency (in polling and interrupt modes) and bandwidth of our new implementation is compared with that of the native MPI implementation on RS/6000 SP systems. The results indicate that the MPI implementation on LAPI performs comparably to or better than the original MPI implementation in most cases. Improvements of up to 17.3 percent in polling mode latency, 35.8 percent in interrupt mode latency, and 20.9 percent in bandwidth are obtained for certain message sizes. The implementation of MPI on top of LAPI also outperforms the native MPI implementation for the NAS Parallel Benchmarks.

引用

页码：1081 / 1093

页数：13

共 20 条

[1]

AGERWALA T, 1995, IBM SYST J, V34, P152, DOI 10.1147/sj.342.0152

[2]

[Anonymous], 1994, MPI MESS PASS INT ST

[3] Implementing efficient MPI on LAPI for IBM RS/6000 SP systems: Experiences and performance evaluation [J].

Banikazemi, M ;

Govindaraju, RK ;

Blackmore, R ;

Panda, DK .

IPPS/SPDP 1999: 13TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & 10TH SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1999, :183-190

[4] Efficient message passing interface (MPI) for parallel computing on clusters of workstations [J].

Bruck, J ;

Dolev, D ;

Ho, CT ;

Rosu, MC ;

Strong, R .

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1997, 40 (01) :19-34

[5]

CHANG C, 1996, P SUP 9L

[6]

CHIEN A, 1997, P 8 SIAM C PAR PROC

[7]

CROPP W, 1996, HIGH PERFORMANCE POR

[8]

Culler D., 1998, PARALLEL COMPUTER AR

[9]

Duato J., 1997, INTERCONNECTION NETW

[10]

FOSTER JGI, 1996, P 2 MPI DEV C, P10

← 1 2 →