Summary

You have completed the Finding Hotspots tutorial. Here are some important things to remember when using the Intel® VTune™ Amplifier to analyze your code for hotspots:

Step

Tutorial Recap

Key Tutorial Take-Aways

1. Prepare for analysis

If you used the Visual Studio* IDE: You chose the target for the Basic Hotspots analysis, configured Visual Studio to generate symbol information for your binary files, built the target in the Release mode, and created the performance baseline.

If you used the standalone GUI: You set up your environment to enable generating symbol information for your binary files, built the target in the Release mode, created the performance baseline, and created the VTune Amplifier project for your analysis target.

  • Configure your project properties to get the most accurate results for user binaries and to analyze the performance of your application at the code line level.

  • Create a performance baseline to compare the application versions before and after optimization. Make sure to use the same workload for each application run.

  • Use the New Amplifier Result tab to choose and configure your analysis target. For Visual Studio* projects, the analysis target settings are inherited automatically.

  • Use the Analysis Type configuration window to choose, configure, and run the analysis. You can also run the analysis from command line using the amplxe-cl command.

2. Find hotspots

You launched the Basic Hotspots data collection that analyzed function calls and CPU time spent in each program unit of your application and identified the following hotspots:

  • A function that took the most CPU time and could be a good candidate for algorithm tuning.
  • The code section that took the most CPU time to execute.

  • Start analyzing the performance of your application from the Summary window to explore the performance metrics for the whole application. Then, move to the Bottom-up window to analyze the performance per function. Focus on the hotspots - functions that took the most CPU time. By default, they are located at the top of the table.

  • Double-click the hotspot function in the Bottom-up pane or Call Stack pane to open its source code and navigate between hotspots using the Source window navigation buttons.

3. Eliminate hotspots

You optimized the algorithm by enabling the OpenMP* library create a private copy of the array. You rebuilt the application and got performance gain of 6202 ms.

Click the Source Editor button to open your default source editor directly from the VTune Amplifier Source window.

4. Analyze concurrency

You launched the Concurrency analysis and identified poor thread concurrency for the whole application execution. You analyzed the timeline and identified poor thread balance: all OpenMP threads were constantly transferring execution to each other and were waiting for all threads to complete execution.

  • Start your analysis with the Summary window. Consider the Target concurrency metric specified in the CPU Usage Histogram as your optimization goal. The Average metric is calculated as CPU time / Elapsed time. Use this number as another baseline for your performance measurements. The closer this number to the number of cores, the better.

  • In the Bottom-up window, use the Filter In by Selection context menu option to focus on the performance-critical functions in the grid and analyze their performance over time in the Timeline pane.

5. Find lock

You ran the Locks and Waits analysis and identified the following hotspots:

  • Two synchronization objects with the high Wait Time and Wait Count values and poor CPU utilization that could be locks affecting application parallelism. Your next step is to analyze the code of their wait functions.

  • The code sections that caused significant waits and numerous transitions between threads.

  • Use the Analysis Type configuration window to choose, configure, and run the analysis. For recently used analysis types, you may use the shortcuts to run a recent analysis:

    • In the standalone interface: From the File menu, select New > [recent_analysis_type].

    • In Visual Studio: Click the down arrow next to the New Analysis button on the VTune Amplifier toolbar and select the required analysis type from the drop-down list.

  • In the Bottom-up window, focus on the synchronization objects that under- or over-utilized the available logical CPUs and have the highest Wait time and Wait Count values. By default, the objects with the highest Wait time values show up at the top of the window.

6. Remove lock

You optimized the application execution time by removing the unnecessary critical section that caused redundant synchronization and by adding the dynamic load scheduling.

Double-click the most time-critical synchronization object in the Bottom-up pane. This opens the source code for the wait function it belongs to. Use the hotspot navigation buttons to identify the most time-critical code lines.

7. Check your work

You ran the Locks and Waits analysis on the optimized code and compared the results before and after optimization using the Compare mode of VTune Amplifier.

Perform regular regression testing by comparing analysis results before and after optimization. Click the Compare Results button on the VTune Amplifier toolbar. From command line, use the amplxe-cl command.

Next step: Prepare your own application(s) for analysis. Then use the VTune Amplifier to find and eliminate hotspots.

Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.