Summary

You have completed the Analyzing Locks and Waits tutorial. Here are some important things to remember when using the Intel® VTune™ Amplifier to analyze your code for locks and waits:

Step

Tutorial Recap

Key Tutorial Take-aways

1. Prepare for analysis

You built the target, created the performance baseline, and created the VTune Amplifier project for your analysis target. Your application is ready for analysis.

  • Create a performance baseline to compare the application versions before and after optimization. Make sure to use the same workload for each application run.

  • Create a VTune Amplifier project and use the Analysis Target tab to choose and configure your analysis target.

2. Find lock

You ran the Locks and Waits data collection and identified the following hotspots:

  • Synchronization object with the high Wait Time and Wait Count values and poor thread concurrency that could be a lock affecting application parallelism. Your next step is to analyze the code of this function.

  • Code section that caused a significant wait and during which the processor was poorly utilized.

  • Use the Analysis Type tab to choose, configure, and run the analysis. You can also run the analysis from command line using the amplxe-cl command.

  • Start analyzing the performance of your application with the Summary window to explore the performance metrics for the whole application. Then, move to the Bottom-up window to analyze the synchronization objects. Focus on the synchronization objects that under- or over-utilized the available logical CPUs and have the highest Wait time and Wait Count values. By default, the objects with the highest Wait time values show up at the top of the window.

3. Remove lock

You optimized the application execution time by removing the unnecessary mutex that caused a lot of Wait time.

Expand the most time-critical synchronization object in the Bottom-up pane and double-click the wait function it belongs to. This opens the source code for this wait function and you can navigate to the most performance critical source lines.

4. Check your work

You ran the Locks and Waits analysis on the optimized code and compared the results before and after optimization using the Compare mode of the VTune Amplifier. The comparison shows that, with the optimized version of the tachyon_analyze_locks application (r001lw result), you managed to remove the lock preventing application parallelism and significantly reduce the application execution time.

  • Perform regular regression testing by comparing analysis results before and after optimization. From GUI, click the Compare Results button on the VTune Amplifier toolbar. From command line, use the amplxe-cl command.

  • Expand each data column by clicking the button to identify the performance gain per thread concurrency level.

Next step: Prepare your own application(s) for analysis. Then use the VTune Amplifier to find and eliminate locks preventing parallelism.
For more complete information about compiler optimizations, see our Optimization Notice.