Optimizing sparse matrix-vector product computations using unroll and jam

被引：49

作者：

Mellor-Crummey, J ^{[1
]}

Garvin, J ^{[1
]}

机构：

[1] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA

来源：

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS | 2004年 / 18卷 / 02期

关键词：

sparse matrices; matrix vector product; performance optimization; sparse matrix format; microfactors; data structures;

D O I：

10.1177/1094342004038951

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large-scale scientific applications frequently compute sparse matrix-vector products in their computational core. For this reason, techniques for computing sparse matrix-vector products efficiently on modern architectures are important. In this paper we describe a strategy for improving the performance of sparse matrix-vector product computations using a loop transformation known as unroll-and-jam. We describe a novel sparse matrix representation that enables us to apply this transformation. Our approach is best suited for sparse matrices that have rows with a small number of predictable lengths. This work was motivated by sparse matrices that arise in SAGE, an application from Los Alamos National Laboratory. We evaluate the performance benefits of our approach using sparse matrices produced by SAGE for a pair of sample inputs. We show that our strategy is effective for improving sparse matrix-vector product performance using these matrices on MIPS R12000, Alpha Ev67, IBM Power 3, and Itanium 2 processors. Our measurements show that for this class of sparse matrices, our strategy improves sparse matrix-vector product performance from a low of 41% on MIPS to well over a factor of 2 on Itanium.

引用

页码：225 / 236

页数：12

共 19 条

[1]

AGARWAL R, 1992, P SUP 1992 MINN MN

[2]

Allen Frances E., 1972, Design and Optimization of Compilers. Ed. by

[3]

[Anonymous], 2001, LNCS, DOI DOI 10.1007/3-540-45545-0

[4] ESTIMATING INTERLOCK AND IMPROVING BALANCE FOR PIPELINED ARCHITECTURES [J].

CALLAHAN, D ;

COCKE, J ;

KENNEDY, K .

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1988, 5 (04) :334-358

[5]

CALLAHAN D, 1990, P SIGPLAN90 C PROGR

[6] IMPROVING THE RATIO OF MEMORY OPERATIONS TO FLOATING-POINT OPERATIONS IN LOOPS [J].

CARR, S ;

KENNEDY, K .

ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1994, 16 (06) :1768-1810

[7]

DAS R, 1992, P 30 AER SCI M AIAA

[8]

GEORGE A, 1981, COMPUTER SOLUTION LA

[9]

GROPP W, 2000, SIAM ANN M SAN JUAN

[10]

Im E.-J., 2000, Ph.D. dissertation

← 1 2 →