You made sure that removing the mutex gave you optimization in the application execution time. To understand the impact of your changes and how the CPU utilization has changed, re-run the Locks and Waits analysis on the optimized code and compare results:
Compare Results Before and After Optimization
Identify the Performance Gain at the Application Level
The Elapsed time data in the Summary window shows the optimization of 4 seconds for the whole application execution and Wait time has decreased by 64.3 seconds.
According to the Thread Concurrency histogram, before optimization (dark blue bar) the application ran serially for 5.343 seconds poorly utilizing available processor cores, but after optimization (light blue bar) it ran serially only for 0.697 seconds.
Identify the Performance Gain Per Program Unit
Click the Bottom-up tab to see the list of synchronization objects used in the code, Wait time utilization across the two results, and the differences side by side:
In the Bottom-up pane, locate the Critical Section you identified as a bottleneck in your code. Since you removed it during optimization, the optimized result r001lw does not show any performance data for this synchronization object. You see that with the optimized result you got almost 54 seconds of optimization in Wait time.
Compare analysis results regularly to look for regressions and to track how incremental changes to the code affect its performance. You may also want to use the VTune Amplifier command-line interface and run the
amplxe-cl command to test your code for regressions. For more details, see the Command Line Interface Support section in the VTune Amplifier online help.