Accounting for memory bank contention and delay in high-bandwidth multiprocessors

被引：17

作者：

Blelloch, GE

Gibbons, PB

Matias, Y

Zagha, M

机构：

[1] AT&T BELL LABS, LUCENT TECHNOL, MURRAY HILL, NJ 07974 USA

[2] TEL AVIV UNIV, DEPT COMP SCI, IL-69978 TEL AVIV, ISRAEL

[3] SILICON GRAPH INC, Mountain View, CA 94043 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 1997年 / 8卷 / 09期

基金：

美国国家科学基金会;

关键词：

memory bank contention; memory delays; parallel machine models; performance analysis; parallel algorithms; shared memory; multiprocessors;

D O I：

10.1109/71.615440

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several shared-memory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant's bulk-synchronous parallel (ssp) model with two parameters: a parameter for memory bank delay, the minimum time for servicing requests at a bank, and a parameter for memory bank expansion, the ratio of the number of banks to the number of processors. We call this model the (d, x)-BSP. We show experimentally that the (d, x)-BSP captures the impact of bank contention and delay on the GRAY C90 and J90 for irregular access patterns, without modeling machine-specific details of these machines. The model has clarified the performance characteristics of several unstructured algorithms on the GRAY C90 and J90, and allowed us to explore tradeoffs and optimizations for these algorithms. In addition to modeling individual algorithms directly, we also consider the use of the (d, x)-BSP as a bridging model for emulating a very high-level abstract model, the Parallel Random Access Machine (PRAM). We provide matching upper and lower bounds for emulating the EREW and QRQW PRAMS on the (d, x)-BSP.

引用

页码：943 / 958

页数：16

共 55 条

[1] SPARCLE - AN EVOLUTIONARY PROCESSOR DESIGN FOR LARGE-SCALE MULTIPROCESSORS [J].

AGARWAL, A ;

KUBIATOWICZ, J ;

KRANZ, D ;

LIM, BH ;

YEUNG, D ;

DSOUZA, G ;

PARKIN, M .

IEEE MICRO, 1993, 13 (03) :48-61

[2]

ALVERSON R, 1990, P INT C SUP JUN

[3]

[Anonymous], 1973, SORTING SEARCHING

[4]

[Anonymous], P 5 INT C SUP

[5]

Bader DA, 1996, 10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM - PROCEEDINGS OF IPPS '96, P292

[6]

BAILEY DH, 1987, IEEE T COMPUT, V36, P293, DOI 10.1109/TC.1987.1676901

[7] INTERFERENCE IN MULTIPROCESSOR COMPUTER-SYSTEMS WITH INTERLEAVED MEMORY [J].

BASKETT, F ;

SMITH, AJ .

COMMUNICATIONS OF THE ACM, 1976, 19 (06) :327-334

[8]

BISSELING RH, 1994, P 13 IFIP WORLD COMP, V1, P509

[9]

BLELLOCH GE, 1997, IN PRESS THEORY COMP

[10]

BLELLOCH GE, 1993, CMUCS93173 CARN MELL

← 1 2 3 4 5 6 →