Intel TSX implementation properties

Intel TSX implementation properties

Imagen de krahnack

Hi,

I hope this forum is the right place to ask this question, please forgive me if it is not.

I am trying to run some benchmarks to measure the read and write sets available in RTM on a Haswell machine. However, the results I get are quite surprising, since they are larger than the L1 and L2 caches: I find the maximum write set is about 280 KB and the read set is about 512 KB....which should not be possible according to the Intel specification (the write set should not exceed L1 cache capacity, 64KB, and the read set should not exceed L2 cache capacity, 256KB).

I must be doing something wrong, but I cannot tell what exactly. The principle of my benchmark is quite straightforward: I allocate a big array (100MB), and I try to access a specific size from inside a transaction. I increase the accessed size until the transaction fails with a capacity abort.

Can someone provide some insight about this behaviour?

Thanks,

Sylvain

publicaciones de 8 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Sergey Kostrov

>>...L2 cache capacity, 256KB...

Take a look at a datasheet of your CPU on ark.intel.com for information about sizes for cache lines. I don't have a Haswell system but, for example, on an Ivy Bridge system I have the size of L2 Cache is 1MB ( 256KB per core / 4 cores ) and it is shared for data and instructions.

Imagen de krahnack

Thanks for your reply.

I already know the cache hierarachy of my machine: L1 caches are 32KB and private (one per core) for both data and instructions. L2 cache is 256KB and also private (one per core) but it is shared for data and instructions. L3 cache is shared among all the 4 cores and is 8MB.

What I do not understand is that, according to Intel specification, the TSX write set should not exceed L1 cache size and the TSX read set should not exceed L2 cache size. However, my benchmark gives me results that are bigger than those sizes, so I am trying to get some information about the actual sizes of TSX read and write sets here, or how to measure them correctly.

Imagen de le g.

Does intel specification clarify the used cache level? Based on what do you say that write set is in L1 and read set is in L2? Why write set  cannot be implemented in L2 while read set cannot use L3?

Imagen de le g.

A supplement, optimization manual clarifies that both read and write are traced in the L1 cache, so your problem is really strange. Can you share your new findings?

Imagen de Rolf Andersson

could you provide a specific document reference to the supplement?
AFAIK, the latest document version is 248966-028 (July 2013).
EDIT: Just re-read that doc 12.1.1 says that the processor tracks addresses in L1 cache 

Thx,
Rolf 

Imagen de le g.

Right. It's in section 12.1.1 in version 248966-028.

"The processor tracks both the read-set addresses and the write-set addresses in the first level data cache (L1 cache) of the processor."

For read set, there is another implementation-specific second level structure, which is not necessarily in L2.

Imagen de Rolf Andersson

yes, just saw that. sorry to take up your time. should have read again before asking.

Thanks,
Rolf 

Inicie sesión para dejar un comentario.