Accelerating scientific computations with mixed precision algorithms

被引:133
作者
Baboulin, Marc [2 ]
Buttari, Alfredo [3 ]
Dongarra, Jack [1 ,4 ,5 ]
Kurzak, Jakub [1 ]
Langou, Julie [1 ]
Langou, Julien [6 ]
Luszczek, Piotr [7 ]
Tomov, Stanimire [1 ]
机构
[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA
[2] Univ Coimbra, Dept Math, Coimbra, Portugal
[3] French Natl Inst Res Comp Sci & Control, Lyon, France
[4] Oak Ridge Natl Lab, Oak Ridge, TN USA
[5] Univ Manchester, Manchester, Lancs, England
[6] Univ Colorado Denver, Dept Math & Stat Sci, Denver, CO USA
[7] MathWorks Inc, Natick, MA USA
关键词
Numerical linear algebra; Mixed precision; Iterative refinement; LINEAR ALGEBRA SUBPROGRAMS; MODEL IMPLEMENTATION; MULTIFRONTAL METHOD; EXTENDED SET; ACCURACY; SYSTEMS; SOLVER;
D O I
10.1016/j.cpc.2008.11.005
中图分类号
TP39 [计算机的应用];
学科分类号
080201 [机械制造及其自动化];
摘要
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (CPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program summary Program title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL: http://cpc.cs.qub.ac.Lik/summaries/AECO-vl-O.html Program obtainable from: CPC Program Library, Queen's University. Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. oflines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BIAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA = LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly = Pb (forward substitution) and then solving Ux = y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration. which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes Published by Elsevier B.V.
引用
收藏
页码:2526 / 2533
页数:8
相关论文
共 47 条
[1]
Hybrid scheduling for the parallel solution of linear systems [J].
Amestoy, PR ;
Guermouche, A ;
L'Excellent, JY ;
Pralet, S .
PARALLEL COMPUTING, 2006, 32 (02) :136-156
[2]
Multifrontal parallel distributed symmetric and unsymmetric solvers [J].
Amestoy, PR ;
Duff, IS ;
L'Excellent, JY .
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2000, 184 (2-4) :501-520
[3]
A fully asynchronous multifrontal solver using distributed dynamic scheduling [J].
Amestoy, PR ;
Duff, IS ;
L'Excellent, JY ;
Koster, J .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2001, 23 (01) :15-41
[4]
Anderson E, 1999, LAPACK USERS GUIDE, DOI [10.1137/1.9780898719604, DOI 10.1137/1.9780898719604]
[5]
[Anonymous], 1997, Applied Numerical Linear Algebra
[6]
[Anonymous], 2003, ITERATIVE METHODS SP, DOI DOI 10.1137/1.9780898718003
[7]
[Anonymous], 1973, Introduction to matrix computations
[8]
SOLVING SPARSE LINEAR-SYSTEMS WITH SPARSE BACKWARD ERROR [J].
ARIOLI, M ;
DEMMEL, JW ;
DUFF, IS .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1989, 10 (02) :165-190
[9]
Arioli M., 2008, RALTR2008006
[10]
ASHCRAFT CC, 1987, INT J SUPERCOMPUT AP, V1, P10