The general purpose finite element (EE) system PERMAS(1) has been extended to support shared and distributed parallel computer architectures as well as workstation clusters. The methods used to parallelize this large application software package are of high generality and have the capability to parallelize all mathematical operations in a FE analysis-not only the solver. Utilizing the existing hyper-matrix data structure for large, sparsely populated matrices, a programming tool called PTM was introduced that automatically parallelizes block matrix operations on-the-fly. PTM totally hides parallelization from higher order algorithms, thus giving the physically oriented expert a virtually sequential programming environment. An operation graph of sub-matrix operations is asynchronously built and executed. A clustering algorithm distributes the work, performing a dynamic load balancing and exploiting data locality. Furthermore, a distributed data management system allows free data access from each node. The generality of the approach is demonstrated by some benchmark examples dealing with different types of FE analyses. (C) 1998 Elsevier Science Ltd. All rights reserved.