MPI-based implementation of a PCG solver using an EBE architecture and preconditioner for implicit, 3-D finite element analysis

被引:42
作者
Gullerud, AS
Dodds, RH [1 ]
机构
[1] Univ Illinois, Dept Civil Engn, Urbana, IL 61801 USA
[2] Sandia Natl Labs, Albuquerque, NM 87185 USA
关键词
element-by-element computation; Hughes-Winget preconditioner; parallel finite elements; message passing; conjugate gradient; domain decomposition; coloring algorithms;
D O I
10.1016/S0045-7949(00)00153-X
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This work describes a coarse-grain parallel implementation of a linear preconditioned conjugate gradient solver using an element-by-element architecture and preconditioner for computation. The solver, implemented within a nonlinear. implicit finite element code, uses an MPI-based message-passing approach to provide portable parallel execution on shared, distributed, and distributed-shared memory computers. The flexibility of the element-by-element approach permits a dual-level mesh decomposition; a coarse, domain-level decomposition creates a load-balanced domain for each processor for parallel computation, while a second level decomposition breaks each domain into blocks of similar elements (same constitutive model- order of integration, element type) for fine-grained parallel computation on each processor. The key contribution here is a new parallel implementation of the Hughes-Winget (HW) element-by-element preconditioner suitable for arbitrary, unstructured meshes. The implementation couples an unstructured dependency graph with a new balanced graph-coloring algorithm to schedule parallel computations within and across domains. The code also includes the diagonal preconditioner and a modern parallel (threaded) sparse direct solver for comparison, Three example problems with up to 158,000 elements and 180,000 nodes analyzed on an SGI/Gray Origin 2000 illustrate the parallel performance of the algorithms and preconditioners, Analyses with varying block sizes illustrate that the two-level decomposition improves overall execution speed with the block size tuned for the cache memory architecture of the executing platform. This implementation of the HW preconditioner shows reasonable parallel efficiency - typically 80'%, on 48 processors. Efficiency for the diagonal preconditioner is also high, with total speedups reaching 86% on 48 CPUs. Calculation of the tangent element stiffnesses shows superlinear speedups for each of the test problems, while the computation of strains/stresses/residual forces shows 80% parallel efficiency on 48 processors. (C) 2001 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:553 / 575
页数:23
相关论文
共 61 条
[1]  
AMIN A, 1994, 8 INT PAR P S CANC, P509
[2]  
[Anonymous], J ENG MECH DIV, DOI DOI 10.1061/JMCEA3.0000098
[3]  
[Anonymous], 1995, CHACO USERS GUIDE VE
[4]  
[Anonymous], 1993, SOLVING LARGE SCALE
[5]  
[Anonymous], 1995, Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
[6]  
ARBOGAST T, 1996, INT C COMP METH WAT, V1, P621
[7]   ON A CLASS OF PRECONDITIONED ITERATIVE METHODS ON PARALLEL COMPUTERS [J].
AXELSSON, O ;
CAREY, G ;
LINDSKOG, G .
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 1989, 27 (03) :637-654
[8]  
Bhardwaj M, 2000, INT J NUMER METH ENG, V47, P513, DOI 10.1002/(SICI)1097-0207(20000110/30)47:1/3<513::AID-NME782>3.0.CO
[9]  
2-V
[10]   PRECONDITIONING FINITE-ELEMENT SUBSURFACE FLOW SOLUTIONS ON DISTRIBUTED-MEMORY PARALLEL COMPUTERS [J].
BINLEY, AM ;
MURPHY, MF .
ADVANCES IN WATER RESOURCES, 1993, 16 (03) :191-202