Intel® Trace Analyzer and Collector is a powerful tool for analyzing MPI applications, which essentially consists of two parts:
This page provides the current Release Notes for Intel® Trace Analyzer and Collector. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.
Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.
There are three key reasons for an application to be MPI-bound:
High wait times inside the MPI library. This occurs when a process waits for the data from other processes. This case is characterized with high values of MPI Imbalance indicator.
Poor or incorrectly set optimization settings of the library.
To improve performance of the heart_demo application, it is necessary to change the communication pattern.
Use Intel® VTune™ Amplifier to understand why the computation time of the 2/64 combination is worse than the 32/4 combination even though the elapsed time is much less. A lower elapsed time for 32/4 is not possible due to the overhead of MPI deployments. As a result, it is better to focus on improving the computation time for the 2/64 combination instead.
To analyze the application performance with VTune Amplifier:
After updating the vector instruction set, collect performance data again with Intel VTune Amplifier to find additional optimization opportunities.
You have completed the Analyzing OpenMP* and MPI Applications tutorial with Application Performance Snapshot, Intel® Trace Analyzer and Collector, and Intel® VTune™ Amplifier. Here are some important things to remember when working with your own hybrid application:
Baseline: A performance metric used as a basis for comparison of the application versions before and after optimization. Baseline should be measurable and reproducible.
Computation Time: The time your application ran without any additional overhead (initialization time, finalization time, etc.). Computation time is included in Elapsed Time.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.