A modified parallel tree code for N-body simulation of the large-scale structure of the universe

被引:11
作者
Becciani, U [1 ]
Antonuccio-Delogu, V [1 ]
Gambera, M [1 ]
机构
[1] Osservatorio Astrofis Catania, I-95125 Catania, Italy
关键词
N-body simulations; parallel computing;
D O I
10.1006/jcph.2000.6557
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
N-body codes for performing simulations of the origin and evolution of the large-scale structure of the universe have improved significantly over the past decade in terms of both the resolution achieved and the reduction of the CPU time. However, state-of-the-art N-body codes hardly allow one to deal with particle numbers larger than a few 10(7), even on the largest parallel systems. In order to allow simulations with larger resolution, we have first reconsidered the grouping strategy as described in J. Barnes (1990, J. Comput. Phys. 87, 161) (hereafter B90) and applied it with some modifications to our WDSH-PT (Work and Data SHaring-Parallel Tree) code (U. Becciani er al., 1996, Comput. Phys. Comm. 99, 1). In the first part of this paper we will give a short description of the code adopting the algorithm of J. E. Barnes and P. Hut (1986, Nature 324, 446) and in particular the memory and work distribution strategy applied to describe the data distribution on a CC-NUMA machine like the CRAY-T3E system. In very large simulations (typically N greater than or equal to 10(7)), due to network contention and the formation of clusters of galaxies, an uneven load easily verifies. To remedy this, we have devised an automatic work redistribution mechanism which provided a good dynamic load balance without adding significant overhead. In the second part of the paper we describe the modification to the Barnes grouping strategy we have devised to improve the performance of the WDSH-PT code. We will use the property that nearby particles have similar interaction lists. This idea has been checked in B90, when an interaction list is built which applies everywhere within a cell C-group containing a small number of particles N-crit. B90 reuses this interaction list for each particle p is an element of C-group in the cell in turn. We will assume each particle p to have the same interaction list. We consider that the agent force F-p on a particle p can be decomposed into two terms F-p = F-far + F-near. The first term F-far is the same for each particle in the cell and is generated by the interaction between a hypothetical particle placed in the center of mass of the C-group and the farther cells contained in the interaction list. F-near is different for each particle p and is generated by the interaction between p and the elements near C-group Thus it has been possible to reduce the CPU time and increase the code performance. This enables us to run simulations with a large number of particles (N similar to 10(7)-10(9)) in nonprohibitive CPU times. (C) 2000 Academic Press.
引用
收藏
页码:118 / 132
页数:15
相关论文
共 25 条
[21]  
PORTER D, 1985, THESIS U CALIFORNIA
[22]  
PRESS WH, 1986, USE SUPERCOMPUTERS S, P184
[23]  
SALMON J, 1997, P 8 C PAR PROC SCI C
[24]  
Salmon J. K., 1990, THESIS CALIFORNIA I
[25]  
von Hoerner S., 1960, Z ASTROPHYS, V50, P184