Tutorial: Analyzing an OpenMP* and MPI Application

Intel® Trace Analyzer and Collector

Application Performance Snapshot

Intel® VTune™ Amplifier for Linux* OS

Legal Information

Discover how to use Intel® Parallel Studio to tune hybrid applications by reviewing MPI utilization inefficiencies and balancing thread load levels.

About This Tutorial

This tutorial uses the sample heart_demo and guides you through basic steps required to analyze hybrid OpenMP* and MPI code for inefficiencies using Intel® VTune™ Amplifier's Application Performance Snapshot, Intel® Trace Analyzer and Collector, and Intel VTune Amplifier.

The tutorial was last updated for the Intel Parallel Studio 2018 product release. The analysis was run on 8 cluster nodes with Intel® Xeon Phi™ processors (formerly code named Knights Landing), each with 256 logical CPUs.

Estimated Duration

Read tutorial: 10 minutes

Run through tutorial with sample application: 60+ minutes

Learning Objectives

After you complete this tutorial, you should be able to:

  • Build an application using the MPI library and Intel® C++ compiler.

  • Run the Application Performance Snapshot tool to get a high-level overview of performance optimization opportunities.

  • Run Intel Trace Analyzer and Collector to identify MPI-bound code.

  • Analyze the communication pattern of the source code.

  • Run the HPC Performance Characterization Analysis with Intel VTune Amplifier to locate vectorization and parallelism issues in the sample code.

  • Compare results before and after optimization.

More Resources

Start Here

For more complete information about compiler optimizations, see our Optimization Notice.