Analyzing the Performance of an OpenMP* and MPI Application

Use Intel® Parallel Studio XE Cluster Edition to understand the cause of ineffective code in a hybrid application by performing a series of steps in a workflow. This tutorial guides you through these workflow steps while using a sample OpenMP* and MPI application, heart_demo, which simulates electrophysiological heart activity.

Step 1: Build and Configure Application

  • Build the heart_demo sample application.
  • Test OpenMP thread and MPI process combinations.

Step 2: Get a Performance Overview with Application Performance Snapshot

  • Run Application Performance Snapshot.
  • Interpret result data.

Step 3: Identify Communication Issues with Intel Trace Analyzer and Collector

  • Set up the Intel Trace Analyzer and Collector environment.
  • Run the application with Intel Trace Analyzer and Collector enabled.

Step 4: Tune MPI-Bound Code

  • Review the message profile chart.
  • Update the application code for MPI communication issues.
  • Run Application Performance Snapshot on updated application.
  • Test application performance.

Step 5: Analyze Vector Instruction Set with Intel VTune Amplifier

  • Use -gtool to run Intel VTune Amplifier HPC Performance Characterization analysis from a command prompt.
  • Review the analysis data to identify legacy instruction set usage.
  • Fix vector instruction set.
  • Test application performance.

Step 6: Analyze Serial and Parallel Code Efficiency with Intel VTune Amplifier

  • Run the HPC Performance Characterization analysis on the application.
  • Identify functions that would benefit from parallelism at the threading level.
  • Update the application code to use parallelized functions.
  • Test application performance.

Start Here

For more complete information about compiler optimizations, see our Optimization Notice.