ADDRESS UNIQUE NEEDS IN CLOUD & HPC PROFILING
Achieving the best performance for an HPC application requires a careful balance of a message passing interface (MPI) parallelism, threading, vectorization, memory access, and more. Intel® VTune™ Amplifier provides specialized HPC analyses to let developers start with a quick snapshot and then, if needed, get more details. Software architects tuning the performance of cloud applications will appreciate the ability to profile a running Java* process in a container.
Get a Quick Performance Snapshot
Analyze MPI and non-MPI applications. (Linux* only)
The application performance snapshot features:
- Lightweight, low overhead profiling
- Scalable profiling detects performance variation with a large number of ranks
- Key metrics, such as MPI and OpenMP* imbalance, low floating-point utilization, and memory stalls
Determine whether this workload will benefit from tuning by viewing all the data in one place (see Fig. 1).
Deeper Analysis with Actionable Detail
See a summary of key HPC performance attributes: MPI efficiency, threading efficiency, memory access efficiency, and floating point utilization. Then dive into the details and optimize the highest impact items first.
Use the HPC analysis to get a fast overview of critical metrics for modern hardware performance or get a more in-depth analysis for each one (see Fig. 2).
Easier Multirank Analysis of MPI and OpenMP*
For hybrid MPI and OpenMP applications, it is important to explore OpenMP inefficiency along with MPI communication between ranks. The lower the communication spin time, the more the rank is executing, and the more impact OpenMP tuning has.
Intel VTune Amplifier can be installed on a cluster. For further tuning of MPI, use Intel® Trace Analyzer and Collector.
The list shows OpenMP regions where performance tuning can significantly reduce execution time, with the highest impact regions shown first (see Fig. 3).