MPI-FM: High performance MPI on workstation clusters

被引:51
作者
Lauria, M [1 ]
Chien, A [1 ]
机构
[1] UNIV ILLINOIS,DEPT COMP SCI,URBANA,IL 61801
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
D O I
10.1006/jpdc.1996.1264
中图分类号
TP301 [理论、方法];
学科分类号
081202 [计算机软件与理论];
摘要
Despite the emergence of high speed LANs, the communication performance available to applications on workstation clusters still falls short of that available on MPPs. A new generation of efficient messaging layers is needed to take advantage of the hardware performance and to deliver it to the application level. Communication software is the key element in bridging the communication performance gap separating MPPs and workstation clusters. MPI-FM is a high performance implementation of Message Passing Interface (MPI) for networks of workstations connected with a Myrinet network, built on top of the Fast Messages (FM) library. Based on the FM version 1.1 released in Fall 1995, MPI-FM achieves a minimum one-way latency of 19 mu s and a peak bandwidth of 17.3 Mbyte/s with common MPI send and receive function calls. A direct comparison using published performance figures shows that MPI-FM running on SPARCstation 20 workstations connected with a relatively inexpensive Myrinet network outperforms the MPI implementations available on the IBM SP2 and the Gray T3D, both in latency and in bandwidth, for messages up to 2 kbyte in size. We describe the critical performance issues found in building a high level messaging library (MPI) on top of a low level messaging layer (FM), and the design solutions we adopted for them. One such issue was the direct and efficient support of common operations like adding and removing a header. Another was the exchange of critical information between the layers, like the location of the destination buffer. These two optimizations are both shown to be necessary, and their combination sufficient to achieve the aforementioned level of performance. The performance contribution of each of these optimizations is examined in some detail. These results delineate a new design approach for low level communication layers in which a closer integration with the upper layer and an appropriate balance of the communication pipeline stages are the key elements for high performance. (C) 1997 Academic Press.
引用
收藏
页码:4 / 18
页数:15
相关论文
共 22 条
[1]
*AM NAT STAND I, 1987, X31391987 ANSI
[2]
ANDERSON TM, 1992, COMPCON 1992, P261
[3]
[Anonymous], 1994, MPI MESS PASS INT ST
[4]
ORCA - A LANGUAGE FOR PARALLEL PROGRAMMING OF DISTRIBUTED SYSTEMS [J].
BAL, HE ;
KAASHOEK, MF ;
TANENBAUM, AS .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1992, 18 (03) :190-205
[5]
MYRINET - A GIGABIT-PER-SECOND LOCAL-AREA-NETWORK [J].
BODEN, NJ ;
COHEN, D ;
FELDERMAN, RE ;
KULAWIK, AE ;
SEITZ, CL ;
SEIZOVIC, JN ;
SU, WK .
IEEE MICRO, 1995, 15 (01) :29-36
[6]
BUZZARD G, 1995, P IEEE HOT INT S
[7]
CLARK D, 1985, SCM S OS PRINC 85, P171
[8]
AN ANALYSIS OF TCP PROCESSING OVERHEAD [J].
CLARK, DD ;
JACOBSON, V ;
ROMKEY, J ;
SALWEN, H .
IEEE COMMUNICATIONS MAGAZINE, 1989, 27 (06) :23-29
[9]
FRANKE H, 1995, P INT S COMP ARCH
[10]
HILL MD, 1995, COMPCON