Practical experience in the numerical dangers of heterogeneous computing

被引:14
作者
Blackford, LS
Cleary, A
Petitet, A
Whaley, RC
Demmel, J
Dhillon, I
Ren, H
Stanley, K
Dongarra, J
Hammarling, S
机构
[1] UNIV CALIF BERKELEY, DIV COMP SCI, BERKELEY, CA 94720 USA
[2] OAK RIDGE NATL LAB, KNOXVILLE, TN 37996 USA
[3] NUMER ALGORITHMS GRP LTD, OXFORD OX2 8DR, ENGLAND
来源
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE | 1997年 / 23卷 / 02期
关键词
distributed-memory systems; floating-point arithmetic; heterogeneous processor networks; message passing; numerical software; reliability;
D O I
10.1145/264029.264030
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Special challenges exist in writing reliable numerical library software for heterogeneous computing environments. Although a lot of software for distributed-memory parallel computers has been written, porting this software to a network of workstations requires careful consideration. The symptoms of heterogeneous computing failures can range from erroneous results without warning to deadlock. Some of the problems are straightforward to solve, but for others the solutions are not so obvious, or incur an unacceptable overhead. Making software robust on heterogeneous systems often requires additional communication. We describe and illustrate the problems encountered during the development of ScaLAPACK and the NAG Numerical PVM Library. Where possible, we suggest ways to avoid potential pitfalls, or if that is not possible, we recommend that the software not be used on heterogeneous networks.
引用
收藏
页码:133 / 147
页数:15
相关论文
共 21 条
[1]  
Anderson E., 1995, LAPACK USERS GUIDE
[2]  
Choi J., 1995, Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science, V1041, P95
[3]  
CHOI J, 1995, APPL PARALLEL COMPUT, P107, DOI DOI 10.1007/3-540-60902-4
[4]  
DEMMEL J, 1996, HETEROGENEOUS COMPUT, P64
[5]  
DEMMEL JW, 1995, ELECTRON T NUMER ANA, V3, P116
[6]   AN EXTENDED SET OF FORTRAN BASIC LINEAR ALGEBRA SUBPROGRAMS [J].
DONGARRA, JJ ;
DUCROZ, J ;
HAMMARLING, S ;
HANSON, RJ .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1988, 14 (01) :1-17
[7]   AN EXTENDED SET OF BASIC LINEAR ALGEBRA SUBPROGRAMS - MODEL IMPLEMENTATION AND TEST PROGRAMS [J].
DONGARRA, JJ ;
DUCROZ, J ;
HAMMARLING, S ;
HANSON, RJ .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1988, 14 (01) :18-32
[8]   A SET OF LEVEL 3 BASIC LINEAR ALGEBRA SUBPROGRAMS - MODEL IMPLEMENTATION AND TEST PROGRAMS [J].
DONGARRA, JJ ;
DUCROZ, J ;
HAMMARLING, S ;
DUFF, I .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1990, 16 (01) :18-28
[9]  
DONGARRA JJ, 1990, ACM T MATH SOFTWARE, V16, P1, DOI 10.1145/77626.79170
[10]  
DONGARRA JJ, 1995, USERS GUIDE BLACS 1