Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

被引:69
作者
Goeddeke, Dominik [1 ]
Strzodka, Robert [2 ]
机构
[1] TU Dortmund, Dept Appl Math, Fak Math, D-44227 Dortmund, Germany
[2] Max Planck Inst Informat, D-66123 Saarbrucken, Germany
关键词
GPU Computing; mixed-precision iterative refinement; multigrid; tridiagonal solvers; cyclic reduction; finite elements; NVIDIA CUDA; SERIES LINEAR ALGEBRA; GRAPHICS; SYSTEM; FLOW;
D O I
10.1109/TPDS.2010.61
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular, single-precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother, we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed-precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude.
引用
收藏
页码:22 / 32
页数:11
相关论文
共 34 条
[1]  
[Anonymous], P ACM SIGGRAPH EUROG
[2]  
[Anonymous], WHIT NVIDIAS NEXT GE
[3]  
[Anonymous], 2006, 0601 PIX AN STUD
[4]  
AXELSSON O, 2001, FINITE ELEMENT SOLUT, V35
[5]   Sparse matrix solvers on the GPU:: Conjugate gradients and multigrid [J].
Bolz, J ;
Farmer, I ;
Grinspun, E ;
Schröder, P .
ACM TRANSACTIONS ON GRAPHICS, 2003, 22 (03) :917-924
[6]   HANDBOOK SERIES LINEAR ALGEBRA - SOLUTION OF REAL AND COMPLEX SYSTEMS OF LINEAR EQUATIONS [J].
BOWDLER, HJ ;
MARTIN, RS ;
PETERS, G ;
WILKINSO.JH .
NUMERISCHE MATHEMATIK, 1966, 8 (03) :217-&
[7]   Large calculation of the flow over a hypersonic vehicle using a GPU [J].
Elsen, Erich ;
LeGresley, Patrick ;
Darve, Eric .
JOURNAL OF COMPUTATIONAL PHYSICS, 2008, 227 (24) :10148-10161
[8]   Parallel computing experiences with CUDA [J].
Garland, Michael ;
Le Grand, Scott ;
Nickolls, John ;
Anderson, Joshua ;
Hardwick, Jim ;
Morton, Scott ;
Phillips, Everett ;
Zhang, Yao ;
Volkov, Vasily .
IEEE MICRO, 2008, 28 (04) :13-27
[9]  
Goddeke Dominik, 2009, 2009 International Conference on High Performance Computing & Simulation (HPCS), P12, DOI 10.1109/HPCSIM.2009.5191718
[10]   Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations [J].
Goeddeke, Dominik ;
Strzodka, Robert ;
Turek, Stefan .
INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2007, 22 (04) :221-256