Intel has released details of Intel® Transactional Synchronization Extensions (Intel® TSX) for the future multicore processor code-named “Haswell”. The updated specification (Intel® Architecture Instruction Set Extensions Programming Reference) can be downloaded.
In this blog, I’ll introduce Intel TSX and provide a little background. Please refer to The Transactional Synchronization Extensions Chapter (Chapter 8) in the manual for additional information. These new synchronization extensions (Intel TSX) are useful in shared-memory multithreaded applications that employ lock-based synchronization mechanisms.
In a nutshell, Intel TSX provides a set of instruction set extensions that allow programmers to specify regions of code for transactional synchronization. Programmers can use these extensions to achieve the performance of fine-grain locking while actually programming using coarse-grain locks. I have written a simple illustrative example in my blog “Coarse-grained locks and Transactional Synchronization explained.”
Locks are a low-level programming construct (close to the hardware), so any discussion of Intel TSX will be low level too. How Intel TSX might affect higher-level programming methods, or enable new programming models, is beyond the scope of my blog but I will briefly comment on it at the end of this blog.
Why is this useful?
With transactional synchronization, the hardware can determine dynamically whether threads need to serialize through lock-protected critical sections, and perform serialization only when required. This lets the processor expose and exploit concurrency that would otherwise be hidden due to dynamically unnecessary synchronization.
At the lowest level with Intel TSX, programmer-specified code regions (also referred to as transactional regions) are executed transactionally. If the transactional execution completes successfully, then all memory operations performed within the transactional region will appear to have occurred instantaneously when viewed from other logical processors. A processor makes architectural updates performed within the region visible to other logical processors only on a successful commit, a process referred to as an atomic commit.
These extensions can help achieve the performance of fine-grain locking while using coarser grain locks. These extensions can also allow locks around critical sections while avoiding unnecessary serializations. If multiple threads execute critical sections protected by the same lock but they do not perform any conflicting operations on each other’s data, then the threads can execute concurrently and without serialization. Even though the software uses lock acquisition operations on a common lock, the hardware is allowed to recognize this, elide the lock, and execute the critical sections on the two threads without requiring any communication through the lock if such communication was dynamically unnecessary.
Intel TSX Interfaces
Intel TSX provides two software interfaces. The first, called Hardware Lock Elision (HLE) is a legacy compatible instruction set extension (comprised of the XACQUIRE and XRELEASE prefixes) that are used to specify transactional regions. HLE is compatible with the conventional lock-based programming model. Software written using the HLE hints can run on both legacy hardware without TSX and new hardware with TSX. The second, called Restricted Transactional Memory (RTM) is a new instruction set interface (comprised of the XBEGIN, XEND, and XABORT instructions) that allows programmers to define transactional regions in a more flexible manner than is possible with HLE. Unlike the HLE extensions, but just like most new instruction set extensions, the RTM instructions will generate an undefined instruction exception (#UD) on older processors that do not support RTM. RTM also requires the programmer to provide an alternate code path for when the transactional execution is not successful.
In summary: “Intel Transactional Synchronization Extensions (Intel TSX) comes in two flavors: HLE and RTM. Hardware Lock Elision (HLE) is legacy compatible. Restricted Transactional Memory (RTM) offers flexibility but requires the programmer to provide an alternative code path for when transactional execution is not successful.”
The specification describes these extensions in detail and outlines various programming considerations to get the most out of them.
Intel TSX Applicability
Intel TSX targets a certain class of shared-memory multi-threaded applications; specifically multi-threaded applications that actively share data. Intel TSX is about allowing programs to achieve fine-grain lock performance without requiring the complexity of reasoning about fine-grain locking.
However, if there is high data contention the algorithm would need to change in order to have an opportunity for high scalability. There are no magic bullets that can solve the problem, since true high data contention implies that the algorithm is effectively serialized.
How Intel TSX might affect higher-level programming methods, or enable new programming models, is beyond the scope of my blog. Several experimental compiler implementations, not related specifically to Intel TSX, are available including gcc 4.7 which will have an experimental implementation. We can expect languages standards committees will be reviewing proposals on how to add transactional models at a language level (Intel has supported the creation of the Draft Specification of Transaction Language Constructs for C++). Intel TSX may enable a more efficient implementation of some transactional models than without Intel TSX. Much work remains to focus on real-world examples of usages and applications to develop and refine future usage. Good luck to all involved!
While Intel TSX may enable efficient implementations of new programming models, it does not require a new programming model and does not propose a new programming model. Intel TSX provides hardware-supported transactional-execution extensions to ease the development and improve the performance of existing programming models.
Intel TSX provides extensions that allow programmers to specify regions of code for transactional execution. Programmers can use these extensions to achieve higher performance with lesser effort, for example achieve fine-grain locking performance while programming with coarser-grain locks. This is a big help and therefore big news for programmers.
Please check out the specification and stay tuned for information about supporting tools from Intel and others in the coming months.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804