P-Gadget3 is a computational astrophysics tool. Scientists use it to simulate self-gravitating systems with added complex gas physics. The problems that this code helps to solve include the formation of cosmological large-scale structures, clusters and galaxies, star formation, and metal enrichment. To model the physics of gas, P-Gadget3 uses smooth particle hydrodynamics (SPH). This mesh-free computational method approximates parcels of gas or fluid as particles. The code scales to hundreds of thousands of cores and has a considerable user base.
Recently we learned about the performance optimization work on the SPH solver in P-Gadget 3. It was carried out by Dr. Fabio Baruffa, a Senior HPC Application Specialist at the Leibniz Supercomputing Centre.
Dr. Baruffa shared his methods of work:
- the isolation of a kernel code with serialization
- the usage of Intel® VTune™ to spot bottlenecks, and
- the principle of minimally invasive approach.
He also demonstrated performance optimization techniques used in this project, such as
- transformation to lockless loops and
- improved vectorization with the help of runtime conversion of an array of structures (AoS) to a structure of arrays (SoA).
The result of the optimization efforts was a tremendous performance gain. On Intel® Xeon® processors, the optimized SPH kernel works 2.6-4.7x faster. On 68-core Intel® Xeon Phi™ processors, the speedup is 20x.