Analyze Vector Instruction Set with Intel® VTune™ Amplifier

Use Intel® VTune™ Amplifier to understand why the computation time of the 2/64 combination is worse than the 32/4 combination even though the elapsed time is much less. A lower elapsed time for 32/4 is not possible due to the overhead of MPI deployments. As a result, it is better to focus on improving the computation time for the 2/64 combination instead.

To analyze the application performance with VTune Amplifier:


You have completed the Analyzing OpenMP* and MPI Applications tutorial with Application Performance Snapshot, Intel® Trace Analyzer and Collector, and Intel® VTune™ Amplifier. Here are some important things to remember when working with your own hybrid application:

Key Terms

Baseline: A performance metric used as a basis for comparison of the application versions before and after optimization. Baseline should be measurable and reproducible.

Computation Time: The time your application ran without any additional overhead (initialization time, finalization time, etc.). Computation time is included in Elapsed Time.

