Hi, I am running on a 2S Intel motherboard, S2600GZ, with 2 x E5-2670 cpus. I'm measuring my expected cache latencies (4 from L1, 12 from L2, 40 from L3) but when I try to measure, using huge pages, the latency of the test illustrated below (again using huge pages) on SLES 11 SP2, I observe that I'm getting either X or 2X the latency from run to run. In some cases the latency is 80-90 ns and in others it's 160-180 ns. I'm sure the latency isn't the later, but I've pulled 1 CPU out of the motherboard thinking I may be inadvertantly accessing it's memory but that's rectified this issue. Do you have any idea why I'm observing this behavior? The test does the following:1) allocates a large span of memory, say 32MB.2) accesses randomly a 8B element every 4096 B, but only 1 access every 4096B block3) that access then contains the pointer to the next access.. and so on.4) once you've made the measurement you flush every step of the walk, using CLFLUSH.5) repeat till you get a good memory latency estimate.I've affinitized the process with "numactl" to no avail.Lastly, I've accessed every 128KB of a likewise 32MB array, and measure the latency of that pointer chase and don't observe this behavior. I get a reproducible number for the latency in that test.Any pointers or information as to things I should be aware of is greatly appreciated..perfwise
For more complete information about compiler optimizations, see our Optimization Notice.