Intel® Transactional Synchronization Extensions (Intel® TSX) provides hardware transactional memory support. It exposes a speculative execution mode to the programmer to improve locking performance. There are many publications about Intel TSX and this article is not focused on explaining the concept. You can refer to the most comprehensive list of TSX-related technical resources in the Roman Dementiev blog.
Parallelism delivers the performance High Performance Computing (HPC) requires. The parallelism runs across several layers: super scalar, vector instructions, threading and distributed memory with message passing. OpenMP* is a commonly used threading abstraction, especially in HPC. Many HPC applications are moving to a hybrid shared memory/distributed programming model where both OpenMP* and MPI* are used.
[2013 Oct 17: Blog updated to split patch into two patches, one for Intel® VTune™ Amplifier changes and one for MKL/ifort changes.]
[2013 Oct 22: Support for Intel® VTune™ Amplifier became part of Julia master sources. Look for USE_INTEL_JITEVENTS in Julia/Make.inc for how to enable Amplifier support.]
One of the popular metrics that is frequently used to estimate performance is FLOP/s. This document shares the results of our experience with using Intel VTune Amplifier XE to estimate FLOP/s.