BSP clusters: High performance, reliable and very low cost

被引:5
作者
Donaldson, SR [1 ]
Hill, JMD
Skillicorn, DB
机构
[1] Univ Oxford, Comp Lab, Oxford OX1 3QD, England
[2] Sychron Ltd, Oxford, England
[3] Queens Univ, Kingston, ON, Canada
基金
英国工程与自然科学研究理事会;
关键词
networks of workstations; bulk synchronous parallelism; low-latency communication; NAS parallel benchmarks;
D O I
10.1016/S0167-8191(99)00103-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 [计算机软件与理论];
摘要
We describe a transport protocol suitable for BSPlib programs running on a cluster of PCs connected by a 100 Mbps Ethernet switch. The protocol provides a reliable packet-delivery mechanism that uses global knowledge of a program's communication pattern to maximise switch performance. The performance is comparable to previous low-latency protocols on similar hardware, but the addition of reliability means that this protocol can be directly used by application software. For a modest budget of $US20 000 it is possible to build a machine that outperforms an IBM SP2 on all the NAS benchmarks (BT +80%, SP +70%, MG +9%, and LU +65% improvement), and an SGI Origin 2000 on half (BT +10, SP -24%, MG +10%, and LU -28%). The protocol has a CPU overhead of 1.5 mu s for packet download and 3.6 mu s for upload. Small packets can be communicated through the switch in a pipelined fashion every 21 mu s. Application-to-application one-way latency is 29 mu s plus the latency of the switch. A raw link bandwidth of 93 Mbps is achieved for 1400-byte packets. and 50 Mbps for 128-byte packets. This scales to eight processors communicating at 91 Mbps per link, to give a sustained global bandwidth of 728 Mbps. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:199 / 242
页数:44
相关论文
共 51 条
[1]
Bailey David, 1995, Technical report, Technical Report NAS-95-020
[2]
BARATLOO A, 1995, P 4 IEEE INT S HIGH
[3]
Beck M., 1998, LINUX KERNEL INTERNA, V2nd
[4]
Blum JM, 1998, LECT NOTES COMPUT SC, V1388, P498
[5]
BODEN NJ, 1994, MYRINET GIGABIT PER
[6]
BRUSTOLINI JC, 1992, CMUCS93132 CARN U SC
[7]
BUONADONNA P, 1998, SUPERCOMPUTING 98
[8]
Ciaccio G, 1998, LECT NOTES COMPUT SC, V1388, P534
[9]
CIACCIO G, 1999, COMMUNICATION
[10]
*CISC SYST, CAT 2900 SER XL INST