Compare with Previous Result

You made sure that removing the mutex gave you 14.196 seconds of optimization in the application execution time. To understand the impact of your changes and how the CPU utilization has changed, re-run the Locks and Waits analysis on the optimized code and compare results:

  1. Compare results before and after optimization.

  2. Identify the performance gain.

Compare Results Before and After Optimization

  1. Run the Locks and Waits analysis on the modified code.

  2. Click the Compare Results button on the Intel® VTune™ Amplifier toolbar.

    The Compare Results window opens.

  3. Specify the Locks and Waits analysis results you want to compare:

Identify the Performance Gain at the Application Level

The Summary window opens providing the statistics for the difference between collected results.

The Elapsed time data in the Summary window shows the optimization of 13 seconds for the whole application execution and Wait time has decreased by 143.5 seconds. Spin Time value has decreased significantly though it is still above the threshold.

According to the Thread Concurrency histogram, before optimization (blue bar) the application ran serially for 17.680 seconds poorly utilizing available processor cores but after optimization (orange bar) it ran serially only for 1.5 seconds.

Identify the Performance Gain Per Program Unit

Click the Bottom-up tab to see the list of synchronization objects used in the code, Wait time utilization across the two results, and the differences side by side:

Difference in Wait time per concurrency level between the two results in the following format: <Difference Wait Time> = <Result 1 Wait Time> – <Result 2 Wait Time>. You may expand the Difference column to display comparison data per concurrency level. By default, the total difference data per Wait time is shown.

Wait time and thread concurrency level for the initial version of the code.

Wait time and thread concurrency level for the optimized version of the code.

Difference in Wait count between the two results in the following format: <Difference Wait Count> = <Results 1 Wait Count> - <Result 2 Wait Count>.

Difference in Spin Time between the two results in the following format: <Difference Spin Time> = <Results 1 Spin Time> - <Result 2 Spin Time>.

In the Bottom-up pane, locate the Critical Section you identified as a bottleneck in your code. Since you removed it during optimization, the optimized result r001lw does not show any performance data for this synchronization object. You see that with the optimized result you got almost 121 seconds of optimization in Wait time.

Compare analysis results regularly to look for regressions and to track how incremental changes to the code affect its performance. You may also want to use the VTune Amplifier command-line interface and run the amplxe-cl command to test your code for regressions. For more details, see the Command-line Interface Support section in the VTune Amplifier online help.

For more complete information about compiler optimizations, see our Optimization Notice.