3 Tuning Secrets for better OpenMP performance using VTune Amplifier XE

  • Overview

Parallelism delivers the capability High Performance Computing (HPC) requires. The parallelism runs across several layers: super scalar, vector instructions, threading and distributed memory with message passing. OpenMP* is a commonly used threading abstraction, especially in HPC. Many HPC applications are moving to a hybrid shared memory/distributed programming model where both OpenMP* and MPI* are used. This webinar focuses on the OpenMP parallel model, and particularly on profiling the performance of OpenMP-based applications. Intel supplies a powerful performance profiling tool, Intel® VTune™ Amplifier XE, that is quite handy for finding performance bottlenecks in OpenMP codes. In this webinar, we will go through the steps necessary to profile OpenMP applications, and will describe how you can quickly identify performance issues with task granularity, workload imbalance and synchronization using Intel VTune Amplifier XE.

Benchmark results were obtained prior to the implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see Performance Benchmark Test Disclosure.