Coarse-grained locks, and the importance of transactions, are key concepts that motivate why Intel Transactional Synchronization Extensions (TSX) is useful. I’ll do my best to explain them in this blog.
In my blog "Transactional Synchronization in Haswell," I describe new instructions (Intel TSX) that will improve the performance of coarse-grained locks. Understanding coarse-grained locks and the concept of transactions are both key to understanding why Intel TSX matters.
Intel TSX may enhance performance of mutual exclusion other than simple coarse-grained locks, but I will focus on coarse-grained locking because it is common and Intel TSX allows highly concurrent accesses using only a simple locking mechanism.
To motivate by illustration, let's consider a simple hash table. Hash tables are used to map a key to a key and value pair in linear time. Two key operations are add (insert) and lookup (retrieve). Resizing and deletion are two additional operations of general interest also, but I will leave them for another time.
Designing a highly concurrent hash table is a non-trivial task, and there are many approaches to allow high levels of concurrency. All these approach add complexity to the program, and often to the data structures themselves.
The simplest approach is a single lock approach. In such an approach, every operation on the hash table starts by obtaining the lock for the table and concludes by releasing the lock. While the lock is held for the operation, no other task on the system can obtain the lock and therefore no hash table operation is allowed to proceed.
Considering Figure 1, no concurrent operations are allowed, so each of the five operations shown would occur one at a time.
A common solution is to break the hash table into smaller regions, and have locks that apply to regions. While this can reduce contention, it still can create needless delays and it definitely complicates the coding and the data structure.
Such an approach is a prime example of taking a coarse-grained lock (a single lock for the entire hash table) and working to make it a finer grained lock (multiple locks for smaller table sections). Coarse-grained locks are easier to use, easier to understand and easier to debug. The only disadvantage is that they tend to impede performance in a multithreaded environment. Multicore processors are increasing the likelihood of this being a problem, and help motivate new hardware assistance so that programming has a chance to stay simple more often than without assistance.
Transactional Synchronization (Intel TSX) as a solution
What would be ideal, is to use the single lock (coarse-grained locking) because it is easy and not very error prone, but still have the performance of a fine-grained implementation. In our Figure 1 example, only one operation conflicts with another. This example does have more conflicts that would be expected in a real world example.
Considering this example, three of the operations have no collision with the other operation so the use of HLE (part of Intel TSX) on the single lock will completely elide the lock. In other words, the performance is very close to the performance of the code if no locking or unlocking code was present. The key however is that the operations are protected by the Intel TSX hardware, which has silently ensured that the protection intended by the lock is indeed assured.
The two operations that map to the same hash table entry will need to be staggered. This will occur even if we are unlucky enough to have them happen at the same time. In such a case, the Intel TSX will detect that the lock was indeed needed and some locking overhead will be incurred. What would actually happen in such a case, is that the colliding tasks will proceed into the protected code until the processor detects the conflict. As such a point, both updates will abort their protected code (also called the transaction). The most common solution then is to have each task proceed but actually enforce the lock on the second try. This means that one task will win, and delay the other, until the operation is complete. The precise decision on how to handle the collision is either up to the processor implementation with HLE, or the programmer with RTM. The processor implementation for HLE will also be fairly simple and conservative, in order to preserve the semantics of the original lock and hence compatibility with processors that lack Intel TSX.
For a hash map, Intel TSX allows for the right things to occur without losing the protection that the locks need to give. Intel TSX ensures the same results as the coarse-grained lock guarantees, but allows unrelated operations to proceed without delays that the coarse-grained locks would have caused. For more information on Transactional Synchronization, see my blog on Intel TSX.
Please check out the specification and stay tuned for information about supporting tools from Intel and others in the coming months.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804