Shared memory on Xeon

Shared memory on Xeon


     Here is an observation I have. Can you help me explain it.

     Setup -1 : A process updates shared memory allocated on the local node(0) and writes to it constantly from a core (3) on package (0) attached to the node. Another process reads it from a core (1) on the same package(0) and attached node(0) constantly. The read cycle I am measuring in clock cycles is around 70.

     Setup-2 : A process running on a core (2) running on package (1) updates shared memory allocated on the remote node (0) and writes to it constantly. Another process reads it from a core (1) on package(0), local to the shared memory node (0). In this case the reader reads it in about 3 cycles (within a statistical error)

     What is the explanation for the reader incurring less penalty in reading this shared memory location when a process running on the remote node is updating it as opposed to a process running on another core on the local package updating it?


publicaciones de 2 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Can we see your test code?

If I were to guess Setup 1 is reading from RAM, whereas Setup 2 is reading from L1.

This seems to be reversed from what you would expect.

Are you timing reads without regards to memory change?

If so, Setup 2 would have longer write intervals thus making fewer cache line evictions for the other socket (and reading same value multiple times).

Jim Dempsey

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya