Intel® Transactional Synchronization Extensions (Intel® TSX) provides hardware transactional memory support. It exposes a speculative execution mode to the programmer to improve locking performance. There are many publications about Intel TSX and this article is not focused on explaining the concept. You can refer to the most comprehensive list of TSX-related technical resources in the Roman Dementiev blog.
Intel TSX allows implementing speculative locks. No serialization appears even with coarse grain locks unless an actual conflict is detected. The conflict leads to aborting the transaction and executing a “fallback path”, usually taking a traditional lock. In order to have good performance and scalability you should keep track of the rate of aborts and keep abort costs as low as possible.
VTune Amplifier XE now has the ability to profile TSX aborts. You can collect the number of aborts, see the abort reason and source code line that caused the abort. This article describes how to get the information about transaction aborts. Future VTune Amplifier XE updates may add more capabilities that will be covered by new articles.
NOTE! This is an experimental feature. It may or may not appear in a production release. There are several limitations that may be resolved during future development. See the list of limitations below.
Running TSX Exploration profile
- CPU: the 4th generation Intel® Core™ processor based on the Intel microarchitecture code name Haswell (Note: CPUs with unlocked multiplier (K versions: i7-4770K, i5-4670K, etc. do not support TSX)
- Intel® VTune™ Amplifier XE 2013 update 14 or later
- OS: Windows or Linux
Running TSX Exploration profile:
$ export AMPLXE_EXPERIMENTAL=tsx
- Set environment variable to enable the experimental feature:
- Run VTune Amplifier XE Graphical User Interface (GUI), configure a project, open “New analysis” dialog and select the “TSX Exploration” analysis type.
- Start the analysis.
Understanding analysis results
The current implementation of the “TSX Exploration” analysis can provide the following information:
- Total number of transaction aborts in the application
- Number of transaction aborts for a function, thread, source code line and instruction
- Abort reason: instruction, data conflict, capacity or other
- Average number of wasted cycles due to abort (Abort Cycles Histogram)
The Summary pane contains the total abort statistics for an application: total number of aborts, abort reasons and a histogram with the number of aborted transactions within a specific duration (in cycles):
In the Bottom-up pane you can find details about functions that caused transaction aborts. You may also choose different grouping to get this information for threads, modules, etc. The Timeline shows PMU events (HLE_RETIRED.ABORTED_PS and RTM_RETIRED.ABORTED_PS) by threads:
Double click on a function to drill down to source or assembly. You can determine instructions that caused the abort with the particular reason.
- The major limitation is that only aborted transactions are counted. There is no data about successful transactions. This prevents calculating transaction abort ratio. To get the ratio of successful/aborted transactions take a look at Intel® PCM and Linux perf tools.
- The abort reasons are detected only for “TSX exploration” analysis without stacks. If you enable the “collect stacks” checkbox in the new analysis window (or from command line) you will be able to collect abort counts and drill down to source. But abort reasons will not be reported.
The “TSX Exploration” feature is in the experimental phase and has limitations. To overcome these limitations you may complement VTune Amplifier XE data with information from other tools. However, the “TSX Exploration” profile does provide valuable information, such as abort reasons, attribution aborts to source code, and average abort processing duration. So you may find it useful to try the early version of TSX profiling in VTune Amplifier XE.