It finally happened!
(This work was done by Vivek Lingegowda during his internship at Intel.)
Most multi-threaded software uses locking. Lock optimization traditionally has aimed to reduce lock contention, that is make the critical regions smaller.
Update: This article is applicable for versions of Intel® VTune™ Amplifier up to 2018.
Tim Mattson (Intel) has authored an extensive series of excellent videos as in introduction to OpenMP*.
Modern high performance computers are built with a combination of resources including: