Tutorial

Analyze MPI Applications with Intel® Trace Analyzer and Collector and Intel® VTune™ Profiler

ID 773180
Date 3/31/2023
Public

Summary

You have completed the Analyze MPI Applications with Intel Trace Analyzer and Collector and VTune Profiler tutorial. The following is the summary of important things to remember when using these tools to analyze and tune your application.

Step Tutorial Recap Key Tutorial Take-aways
1. Optimize MPI communications
  • Prepared for the application analysis.
  • Used the Event Timeline, Function Profile, Message Profile and Imbalance Diagram to detect serialization that slows down the application.
  • Removed serialization by replacing the problem-causing function.
  • Compared the original trace file with the trace file of the revised application.
  • Analyzed the improved communications in the Event Timeline.
  • Ungroup MPI functions to identify which functions slow down the application.

  • Use the Function Profile and Message Profile charts to see how much time is spent in MPI.

  • Generate the idealized trace and compare it with the original trace to get an insight on your application under the ideal circumstances and isolate problematic interactions.

  • In the real-world cases, it may be necessary to formulate a hypothesis regarding how the program should behave and to check this hypothesis using the most suitable chart.

2. Improve intra-process performance
  • Built the target and launched the Basic Hotspots data collection using the interoperability features of the tools.
  • Analyzed function calls and CPU time spent in each program unit of your application and identified the function that took the most CPU time.
  • Found possible way to resolve the issue and optimize the source code.
  • Start analyzing the performance of your application from the Summary window to explore the performance metrics for the whole application.

  • Then, move to the Bottom-up window to analyze the performance per function. Focus on the hotspots - functions that took the most CPU time. By default, they are located at the top of the table.

  • Double-click the hotspot function in the Bottom-up pane or Call Stack pane to open its source code.

Next step: Use the Intel Trace Analyzer and Collector and VTune Profiler to analyze your own application.