In a previous post I discussed the Intel® Transactional Synchronization Extensions (Intel® TSX) technology released in the new generation of processors. I described the Intel® Threading Building Blocks (Intel® TBB) implementation of the HLE interface (
speculative_spin_mutex). Now we can talk about the implementation of
speculative_spin_rtw_mutex, a Preview Feature of TBB 4.2 Update 2.
speculative_spin_rtw_mutex uses RTM for mutual exclusion, and allows both concurrent reads and concurrent writes. It also contains a
spin_rw_mutex because it may be necessary to perform the operation without speculation.
If concurrent executions of code protected by the mutex do not conflict, all reads complete and writes are atomically committed without explicitly taking the lock.
If there is a conflict or another problem that prevents speculative execution, and the transaction is aborted,
speculative_spin_rtw_mutex may retry the transaction or it may take the lock for real. If a writer takes the lock for real, all speculative readers and writers will abort the transaction and wait for the writer to complete, at which time the transactions may be retried. If a reader takes the lock for real all speculative writers will abort the transaction and wait for the reader to release the lock, at which time the transactions may be retried. Real readers and speculative readers may proceed in parallel. All this happens “under the covers”, as part of the TBB implementation.
The reason the
speculative_spin_rtw_mutex has to contain a regular
spin_rw_mutex is because there are no completion guarantees with RTM. The code being protected by the mutex may have operations (such as system calls) that cannot be completed in a transaction. There are also limits to the number of cache lines that can be accessed or modified in a transaction, and if that limit is reached the transaction cannot complete. The
speculative_spin_rtw_mutex guarantees forward progress by limiting the number of times a transaction is tried, and performing a non-speculative lock if necessary.
In the last post I mentioned that a speculative lock requires a “fallback path”, a code path that can be executed when the transaction fails. The
speculative_spin_rtw_mutex is designed such that the same code is used for both the speculative and fallback path. This greatly simplifies its use.
If the transaction is aborted, RTM returns a code giving the reason for the abort. For instance, if a time-slice completes while a thread is executing speculatively, the transaction is aborted, but it may succeed if retried. If the return code indicates a retry may succeed, and if the maximum number of retries is not reached, the transaction will be re-attempted.
Several things to note about the mutex (some of which apply to
- The mutex occupies three cache lines, because the
spin_rw_mutexand the write flag in the mutex must be on separate cache lines, and because allocators do not guarantee allocations occur at the start of a cache line.
- If the architecture does not support RTM, the
speculative_spin_rtw_mutexwill default to a
spin_rw_mutexpadded to guarantee it is on a separate cache line.
- The class does not provide explicit methods to lock and unlock a mutex, i.e. a program cannot define a
speculative_spin_rw_mutex Mand execute an
M.lock().The proper way to use a
speculative_spin_rtw_mutexis to lock and unlock it with a
// code protected by mutex
// on exit from block the mutex is unlocked
// by destructor for the scoped_lock
This is because each thread must have local storage for thread state, and the
scoped_lockon the stack contains that storage.
speculative_spin_rtw_mutexdiffers from other implementations (such as the pthreads mutexes) in that under speculation a write lock may be obtained recursively. Recursively acquiring the same lock in write mode does not deadlock unless it is not taken under speculation. The programming patterns that depend on recursive locks deadlocking are of special interest only; if you depend on this behavior, please don’t use a speculating mutex.
- Each implementation of Intel TSX has limits on how many levels of speculation are supported. These limits may change from generation to generation.
Remember that not all the 4th Generation Intel® Core™ processors support transactional synchronization. You should check ark.intel.com to verify that Intel TSX is available in the processor you are using. On any processors not supporting Intel TSX the speculative mutexes will behave as their non-speculating counterparts, with possibly-worse performance.
Careful performance measurement will help you decide if
speculative_spin_rtw_mutex will help the scalability of your application.
For help optimizing your program with Intel TSX, you should consult the Intel® 64 and IA-32 Architectures Optimization Reference Manual, Chapter 12.