Communication lower bounds for distributed-memory matrix multiplication

被引：126

作者：

Irony, D

Toledo, S ^{[1
]}

Tiskin, A

机构：

[1] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel

[2] Univ Warwick, Dept Comp Sci, Coventry CV4 7AL, W Midlands, England

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2004年 / 64卷 / 09期

基金：

以色列科学基金会;

关键词：

communication; lower bounds; distributed memory; matrix multiplication;

D O I：

10.1016/j.jpdc.2004.03.021

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We present lower bounds on the amount of communication that matrix multiplication algorithms must perform on a distributed-memory parallel computer. We denote the number of processors by P and the dimension of square matrices by n. We show that the most widely used class of algorithms, the so-called two-dimensional (2D) algorithms, are optimal, in the sense that in any algorithm that only uses O(n(2)/p) words of memory per processor, at least one processor must send or receive Omega(n(2)/p(1/2)) words. We also show that algorithms from another class, the so-called three-dimensional (3D) algorithms, are also optimal. These algorithms use replication to reduce communication. We show that in any algorithm that uses Omega(n(2)/p(2/3)) words of memory per processor, at least one processor must send or receive Omega(n(2)/p(2/3)) words. Furthermore, we show a continuous tradeoff between the size of local memories and the amount of communication that must be performed. The 2D and 3D bounds are essentially instantiations of this tradeoff. We also show that if the input is distributed across the local memories of multiple nodes without replication, then Omega(n(2)) words must cross any bisection cut of the machine. All our bounds apply only to conventional o(n(3)) algorithms. They do not apply to Strassen's algorithm or other Theta(n(3)) algorithms. (C) 2004 Elsevier Inc. All rights reserved.

引用

页码：1017 / 1026

页数：10

共 26 条

[1] A HIGH-PERFORMANCE MATRIX-MULTIPLICATION ALGORITHM ON A DISTRIBUTED-MEMORY PARALLEL COMPUTER, USING OVERLAPPED COMMUNICATION [J].