Summary

You have completed the Analyzing Locks and Waits tutorial. Here are some important things to remember when using the Intel® VTune™ Amplifier to analyze your code for locks and waits:

Step

Tutorial Recap

Key Tutorial Take-aways

1. Prepare for analysis

If you used the Visual Studio* IDE: You selected the analyze_locks project as the target for the Locks and Waits analysis.

If you used the standalone GUI: You set up your environment to enable generating symbol information for system libraries and your binary files, built the target in the Release mode, created the performance baseline, and created the VTune Amplifier project for your analysis target. Your application is ready for analysis.

  • Configure the Microsoft* symbol server and your project properties to get the most accurate results for system and user binaries and to analyze the performance of your application at the code line level.

  • Create a performance baseline to compare the application versions before and after optimization. Make sure to use the same workload for each application run.

  • Use the Project Properties: Target tab to choose and configure your analysis target. For Visual Studio* projects, the analysis target settings are inherited automatically.

2. Find lock

You ran the Locks and Waits data collection and identified the following hotspots:

  • Synchronization object with the high Wait Time and Wait Count values and poor thread concurrency that could be a lock affecting application parallelism. Your next step is to analyze the code of this function.

  • Code section that caused a significant wait and during which the processor was poorly utilized.

  • Use the Analysis Type configuration window to choose, configure, and run the analysis. You can also run the analysis from command line using the amplxe-cl command.

  • Start analyzing the performance of your application with the Summary window to explore the performance metrics for the whole application. Then, move to the Bottom-up window to analyze the synchronization objects. Focus on the synchronization objects that under- or over-utilized the available logical CPUs and have the highest Wait time and Wait Count values. By default, the objects with the highest Wait time values show up at the top of the window.

3. Remove lock

You optimized the application execution time by removing the unnecessary mutex that caused a lot of Wait time.

Expand the most time-critical synchronization object in the Bottom-up pane and double-click the wait function it belongs to. This opens the source code for this wait function and you can navigate to the most performance critical source lines.

4. Check your work

You ran the Locks and Waits analysis on the optimized code and compared the results before and after optimization using the Compare mode of the VTune Amplifier. The comparison shows that, with the optimized version of the analyze_locks application (r001lw result), you managed to remove the lock preventing application parallelism and significantly reduce the application execution time.

  • Perform regular regression testing by comparing analysis results before and after optimization. From GUI, click the Compare Results button on the VTune Amplifier toolbar. From command line, use the amplxe-cl command.

  • Expand each data column by clicking the button to identify the performance gain per thread concurrency level.

Next step: Prepare your own application(s) for analysis. Then use the VTune Amplifier to find and eliminate locks preventing parallelism.
For more complete information about compiler optimizations, see our Optimization Notice.