Quantifying the performance differences between PVM and TreadMarks

被引:23
作者
Lu, HH
Dwarkadas, S
Cox, AL
Zwaenepoel, W
机构
[1] UNIV ROCHESTER,DEPT COMP SCI,ROCHESTER,NY 14627
[2] RICE UNIV,DEPT COMP SCI,HOUSTON,TX 77005
基金
美国国家科学基金会;
关键词
D O I
10.1006/jpdc.1997.1332
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper compares two systems for parallel programming on networks of workstations: Parallel Virtual Machine (PVM), a message-passing system, and TreadMarks, a software distributed shared-memory (DSM) system, The eight applications used in this comparison are Water and Barnes-Hut from the SPLASH benchmark suite; 3-D FFT, Integer Sort (IS), and Embarrassingly Parallel (EP) from the NAS benchmarks; ILINK, a widely used genetic linkage analysis program; and Successive Over-Relaxation (SOR) and Traveling Salesman (TSP), Two different input data sets are used for five of the applications, We use two execution environments, The first is a 155 Mbps ATM network with eight Spare-20 model 61 workstations; the second is an eight-processor IBM SP/2. The differences in speedup between TreadMarks and PVM depend mostly on the applications, and only to a much lesser extent on the platform and the data set used, In particular, the TreadMarks speedup for six of the eight applications is within 15% of that achieved with PVM, For one application, the difference in speedup is between 15% and 30%, and for another, the difference is around 50%, We identified four important factors that contribute to the lower performance of TreadMarks: (1) extra messages due to the separation of synchronization and data transfer, (2) extra messages to handle access misses caused by the use of an invalidate protocol, (3) false sharing, and (4) diff accumulation for migratory data, We have quantified the effects of the last three factors by measuring the performance gain when each is eliminated, Of the three factors, TreadMarks' use of a separate request message per page of data accessed is the most important. The effect of false sharing is comparatively low Reducing diff accumulation benefits migratory data only when the diffs completely overlap, When these performance impediments are removed, all of the TreadMarks programs perform within 25% of PVM, and for six out of eight experiments, TreadMarks is less than 5% slower than PVM. (C) 1997 Academic Press.
引用
收藏
页码:65 / 78
页数:14
相关论文
共 28 条
  • [1] Adve Sarita V., 1990, P ISCA, P2, DOI [10.1145/325164.325100, DOI 10.1145/325164.325100]
  • [2] Amza Cristiana, 1996, IEEE COMPUT, V29, P18
  • [3] BAILEY D, 1993, 108863 NASA
  • [4] TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS
    CARTER, JB
    BENNETT, JK
    ZWAENEPOEL, W
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1995, 13 (03): : 205 - 243
  • [5] CHANDRA S, 1994, P 6 INT C ARCH SUPP, P61
  • [6] COTTINGHAM RW, 1993, AM J HUM GENET, V53, P252
  • [7] PARALLELIZATION OF GENERAL-LINKAGE ANALYSIS PROBLEMS
    DWARKADAS, S
    SCHAFFER, AA
    COTTINGHAM, RW
    COX, AL
    KELEHER, P
    ZWAENEPOEL, W
    [J]. HUMAN HEREDITY, 1994, 44 (03) : 127 - 141
  • [8] DWARKADAS S, 1996, P 7 S ARCH SUPP PROG, P186
  • [9] NETWORK-BASED CONCURRENT COMPUTING ON THE PVM SYSTEM
    GEIST, GA
    SUNDERAM, VS
    [J]. CONCURRENCY-PRACTICE AND EXPERIENCE, 1992, 4 (04): : 293 - 311
  • [10] Gharachorloo K., 1990, Proceedings. The 17th Annual International Symposium on Computer Architecture (Cat. No.90CH2887-8), P15, DOI 10.1109/ISCA.1990.134503