Intel TSX implementation properties

Intel TSX implementation properties


I hope this forum is the right place to ask this question, please forgive me if it is not.

I am trying to run some benchmarks to measure the read and write sets available in RTM on a Haswell machine. However, the results I get are quite surprising, since they are larger than the L1 and L2 caches: I find the maximum write set is about 280 KB and the read set is about 512 KB....which should not be possible according to the Intel specification (the write set should not exceed L1 cache capacity, 64KB, and the read set should not exceed L2 cache capacity, 256KB).

I must be doing something wrong, but I cannot tell what exactly. The principle of my benchmark is quite straightforward: I allocate a big array (100MB), and I try to access a specific size from inside a transaction. I increase the accessed size until the transaction fails with a capacity abort.

Can someone provide some insight about this behaviour?



8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

>>...L2 cache capacity, 256KB...

Take a look at a datasheet of your CPU on for information about sizes for cache lines. I don't have a Haswell system but, for example, on an Ivy Bridge system I have the size of L2 Cache is 1MB ( 256KB per core / 4 cores ) and it is shared for data and instructions.

Thanks for your reply.

I already know the cache hierarachy of my machine: L1 caches are 32KB and private (one per core) for both data and instructions. L2 cache is 256KB and also private (one per core) but it is shared for data and instructions. L3 cache is shared among all the 4 cores and is 8MB.

What I do not understand is that, according to Intel specification, the TSX write set should not exceed L1 cache size and the TSX read set should not exceed L2 cache size. However, my benchmark gives me results that are bigger than those sizes, so I am trying to get some information about the actual sizes of TSX read and write sets here, or how to measure them correctly.

Does intel specification clarify the used cache level? Based on what do you say that write set is in L1 and read set is in L2? Why write set  cannot be implemented in L2 while read set cannot use L3?

A supplement, optimization manual clarifies that both read and write are traced in the L1 cache, so your problem is really strange. Can you share your new findings?

could you provide a specific document reference to the supplement?
AFAIK, the latest document version is 248966-028 (July 2013).
EDIT: Just re-read that doc 12.1.1 says that the processor tracks addresses in L1 cache 


Right. It's in section 12.1.1 in version 248966-028.

"The processor tracks both the read-set addresses and the write-set addresses in the first level data cache (L1 cache) of the processor."

For read set, there is another implementation-specific second level structure, which is not necessarily in L2.

yes, just saw that. sorry to take up your time. should have read again before asking.


Leave a Comment

Please sign in to add a comment. Not a member? Join today