Mesh-Based Cosmology Code: MC^2

Background

There are several different computational approaches to model structure formation in the dark matter component. These include tree-based N-body, particle-in-cell (PIC) or particle-mesh (PM), and hybrid methods. Fundamentally, each of these methods attempts to solve the Vlasov-Poisson equations for a collisionless gas of gravitating particles. The structure of galaxy clusters and galactic halos depends on baryon hydrodynamics, as does the analysis of Lyman-alpha clouds, therefore, a complete cosmological simulation framework must incorporate this physics.

PM codes model a collisionless, self-gravitating fluid as particles, integrating the equations of motion for particles, interacting self-consistently via Newtonian forces, forward in time. The particles interact with each other via the potential field created by all of them, computed on a three-dimensional spatial grid. Particles are deposited on the grid by interpolation methods to create a density field; we use both linear (Cloud-In-Cell, CIC) and quadratic interpolation (Triangle-Shaped-Cloud, TSC), the Poisson equation is then solved by any of the available fast methods (FFT, multi-grid). This method reduces the computational complexity of a direct particle-particle calculation from N_p^2 (where N_p is the number of particles) to N_p. The number of grid cells per dimension, N_g, determines the code's spatial resolution.

PM codes are the fastest of all N-body codes, have well-defined error bounds, and have been extensively tested in a variety of applications. In addition, being grid codes, they are relatively easy to couple to grid-based hydro and plasma codes. However, since the resolution of the code is tied to the grid size, they do not have the spatial resolution offered by tree codes. In our project, HOT (tree) and MC^2 (PM) are used as appropriate depending on the particular application.

MC^2 is a parallel PM dark matter code based on Vlasov-Poisson solvers developed at Los Alamos (the IMPACT code) and at UCLA (Viktor Decyk's UPIC framework). MC^2 incorporates a simplified treatment of baryons via the Hydro-Particle-Mesh (HPM) method; neutrino and more complete hydrodynamic modules are under development. MC^2 is designed to provide excellent performance for maximum values of N_p=N_g=2048^3; this number will increase as available computational resources continue to improve in size and performance. Beacuse of the ubiquity of periodic boundary conditions in cosmology problems, MC^2 uses a FFT-based solver for the Poisson equation; time-stepping is handled via a symplectic method.

MC^2 is a Fortran code and exists in two versions, one based on High Performance Fortran (HPF), the other on a Fortran95/MPI-based framework implementation (UPIC). These latest variants of Fortran provide a powerful high level language designed for mathematical modeling, safety, and performance. MC^2 is written to be attractive to scientists who want to spend minimal time in programming and understanding the code structure, but at the same time possess a high level of performance. The UPIC version is one of the world's fastest portable parallel PM codes (see performance figures below).

The main developers of MC^2 are Salman Habib (LANL), Katrin Heitmann (LANL), Robert Ryne (LBNL), and Viktor Decyk (UCLA). Other contributors include Somnath Bharadwaj (IIT), Adam Lidz (Columbia), Varun Sahni (IUCAA), B.S. Sathyaprakash (Cardiff), and Luis Teodoro (LANL).

Code Structure

MC^2 has a classic, modular program structure consisting of a main program, initialization routines, a processing driver, and analysis and reporting routines. The code is easy to modify and enhance; the HPF version can be easily understood, used, and modified by someone with only minimal parallel programming experience. This makes this version invaluable for use by students, post-docs, and researchers who are primarily scientists and not programmers. The simplicity of the code structure also helps with defensive programming (error checks, debugging). The UPIC version uses trusted legacy Fortran77 routines in the lowest layer; these routines have been very carefully programmed to provide the highest performance possible. The middle layer is written in Fortran95 and encapsulates legacy code with simple interfaces and provides aditional safety checks.

In order to run the program, initial data is provided by a single input file; certain output options being specified by another file. The I/O data format has been fixed by the research collaboration and a description of the format is available. Initial conditions are set up by either generating the primordial power spectrum and transfer functions internally in the code using published fits or by running CMBFAST. The resulting power spectrum is then normalized as chosen by the user, and an initial condition for particle positions and velocities determined by the standard Zeldovich approximation technique (the code checks the initial redshift for accuracy: if this value is too large, it is rejected, and the code stops).

Time-stepping is performed by particle deposition on an N_g^3 grid by a CIC or TSC routine, the solution of the Poisson equation carried out by convolving a pseudo-Green function with the FFT of the density field; the force on the grid can be either directly computed in Fourier space or differenced later from the scalar potential. Once the force is available on the grid, it is interpolated onto the particles, and the particles are stepped forward in time using a (2nd or 4th order) split-operator symplectic method.

The HPM version of MC^2 adds a second particle species (baryons); these particles, aside from the usual gravitational interaction, also feel pressure gradient forces. These forces are modeled by assuming an ideal gas equation of state and a temperature-density relation for the gas. The specific enthalpy now plays a role identical to that of the gravitational potential, in that the gradient of this quantity provides the pressure-gradient force on each particle. Thus at each time-step the sum of the potential and the enthalpy lead to a generalized potential whose gradient provides the total force on the baryonic particles (the dark matter particles only interact gravitationally).

The panels above show a two-dimensional slice of the dark matter distribution (left) and the baryon (gas) distribution (right) from a test HPM simulation. Colors represent particle velocity magnitudes. The smoothing of the gas density by pressure forces versus the distribution of dark matter particles is clearly seen. The length scale shown here is 2.5 Mpc/h.

Code Tests and Verification

MC^2 has undergone a rigorous program of testing and verification against other cosmology codes such as HOT, GADGET, HYDRA, and the Princeton TPM code. These results will be published shortly.

The cosmological pancake problem provides a good simultaneous test of the particle dynamics, Poisson solver, and cosmological expansion modules. In addition, it is also a sensitive test of particle collisionality. The figure on the right shows pancake test results for MC^2. The collapse of a single perturbation mode (at some angle to the simulation box plane) is followed well past caustic formation; this is a stringent test of the code's ability to track thin, poorly resolved features. As particles accelerate toward the density midplane, their phase profile develops a backwards `S' shape, and, at caustic formation, the velocity becomes multi-valued at the midplane. Some of the particles that have passed through the midplane fall back and form another pair of caustics, twisting the phase profile further. This caustic formation process repeats an arbitrary number of times, though in practice a numerical code can only follow to an extent limited by its force and mass resolution. In the figure on the right, the solid line is a result form a very high-resolution one-dimensional simulation, red, green, and blue points show the convergence to the exact solution. The clean spiral shape also shows the absence of numerical collsionality.

The four panels below show a two-dimensional spatial slice of the present particle distribution from a simulation with 17 million particles run on four different codes with identical initial conditions (Box size=360Mpc). The colors represent the absolute values of the particle velocities. The overall agreement is very good, even at an individual particle to particle comparison level.

Another well-known test is simulation of the `Santa Barbara' cluster, a publicly available initial condition set. The two panels below show the projected density field of this simulated cluster as produced by MC^2 run with 17 million particles. The first panel is at a redshift of z=0.5, while the second panel is at a redshit of z=0 (today). The two central mass concentrations clearly visible in the first panel have merged into one in the second panel.

Application Example: The Lyman-alpha Forest

MC^2 simulations of Lyman-alpha absorption are being used to analyse data from approximately 3000 SDSS QSO lines of sight. These simulations are carried out using both the HPM and pure dark matter versions. As large numbers of particles are needed in order to reduce shot noise in the numerically obtained density field, production simulations are carried out with 130 million particles.

Several strategies are possible to utilize large scale structure simulations in extracting information form the measured flux power spectrum. The aim is to settle on a solution which has acceptable error (less than 10%), allows for simulating a whole range of parameters, and is time-efficient. Full hydro simulations are the most accurate but extremely time-consuming, HPM simulations less accurate but much faster; the fastest, but least accurate method is to (Gaussian) filter a dark matter simulation, the filtering simulating the action of baryonic pressure forces.

The dependence of the flux power spectrum on the reionization history of the Universe needs to be understood; if this were an important effect, the extraction of cosmological information from the Lyman-alpha forest would be badly compromised. The HPM version of MC^2 allows us to study complex reionization histories by implementing complicated temperature histories, one such T(z) history is shown on the right (top panel). Fortunately, our work has shown that the reioization history has only a minor effect on the extraction of the linear power spectrum from the Lyman-alpha forest.

Measurements of the Lyman-alpha forest probe the primordial power spectrum at very small scales which can never be achieved by CMB measurements. This presents unique opportunities for constraining cosmological parameters, in particular the running of the spectral index. An example of how the SDSS Lyman-alpha data cam constrain the normalization of the spectrum is shown at right (bottom panel). Green points with error bars are SDSS measurements of the flux power spectrum at z=2.85, the curves are MC^2 predictions for the flux power spectrum for a range of amplitudes parametrized by
Delta^2=k_p^3P(k_p,z=2.85)/(2pi^2) at k_p=0.008s/km=0.877h Mpc^(-1).

Code Performance

MC^2 has been run on a variety of parallel platforms, e.g., on QSC at LANL (Alpha cluster), on Seaborg at NERSC (IBM Power3), and on Beowulf Pentium clusters. Performance information is provided in the tables below; the code scales linearly upto the largest number of processors tested so far (2048). As optimization is a continuous process, the figures below are subject to change.

The times below are for a 3D particle simulation, using 127,401,984 particles and a 256X32X512 mesh for 425 time steps using UPIC. Push time is the time to update one particle's position and deposit its mass on the grid, for one time step. Loop time is the total time for running the simulation minus the initalization time. QSC is the fastest machine on which the code has been timed so far though a Xeon/Myrinet cluster such as Pink is almost as fast. Note that, as far as MC^2 is concerned, QSC has almost twice the performance of the SP3 at NERSC and is more than 3 times faster than the previous generation T3E.

Computer Processors Push time Loop time
HP AlphaServer ES45 (QSC) 256 2.4 nsec 151.6 sec
HP AlphaServer ES45 (QSC) 128 5.0 nsec 305.1 sec
HP AlphaServer ES45 (QSC) 64 9.7 nsec 581.4 sec
HP AlphaServer ES45 (QSC) 32 19 nsec 1101.1 sec
Intel 2.4 GHz P4 Xeon, Myrinet 256 2.6 nsec 160.4 sec
Intel 2.4 GHz P4 Xeon, Myrinet 128 4.7 nsec 295.9 sec
Intel 2.4 GHz P4 Xeon, Myrinet 64 9.8 nsec 602.5 sec
Intel 2.4 GHz P4 Xeon, Myrinet 32 19 nsec 1194.9 sec
IBM SP3/375 (Seaborg) 1024 1.4 nsec 113.5 sec
IBM SP3/375 (Seaborg) 512 2.0 nsec 136.9 sec
IBM SP3/375 (Seaborg) 256 4.2 nsec 251.0 sec
IBM SP3/375 (Seaborg) 128 7.4 nsec 438.2 sec
IBM SP3/375 (Seaborg) 64 15 nsec 915.9 sec
IBM SP3/375 (Seaborg) 32 28 nsec 1640.8 sec
Cray T3E-900 (MCurie) 512 3.8 nsec 227.5 sec
Cray T3E-900 (MCurie) 256 8.0 nsec 467.7 sec
Cray T3E-900 (MCurie) 128 15 nsec 915.9 sec
Cray T3E-900 (MCurie) 64 31 nsec 1817.2 sec

The HPF version of MC^2 is still undergoing performance optimization, but preliminary results are available below. The Gflop/s figures are from simulations on the NERSC IBM SP3. The code runs 2-3 times faster on QSC using the native Compaq -hpf extension of f90.

Code Processors GFlop/s Mflops/proc
MC^2/HPF 16 6 375
MC^2/HPF 32 11 344
MC^2/HPF 64 19 297

Back to Cosmology/Beams Main Page.

Back to ICP Main Page.

Salman Habib / LANL / revised November 03
Valid HTML 4.0!