Memory Latencies on Intel® Xeon® Processor E5-4600 and E7-4800 product families

Intel is currently offering two product families of Intel® Xeon® processors that are suited for four sockets: the Intel® Xeon® Processor E5-4600 and E7-4800 product series. This blog is about their difference in terms of memory latency and how to measure it on your own system.

All current Intel® Xeon® processors have integrated memory controllers, i.e. the memory is directly connected to one of the processors. Furthermore, each Intel® Xeon® E7 processor is equipped with 4 Intel® QPI links:

  • One Intel® QPI link is connected to the I/O hub 
  • Three Intel® QPI links build the connection to the other processors.

 

Intel Xeon E7-4800 topology

 

I’m using the recently released Intel® Memory Latency Checker, which prints the latencies from each of the sockets using memory of each of the sockets:

 

CPU

0

1

2

3

0

136

194

198

201

1

194

135

194

196

2

201

194

135

200

3

202

197

198

135

The matrix displays the memory latency when a core on socket n is accessing memory on socket m. As with all measurements, your numbers might vary. Different DIMM configurations and different BIOS settings result in different latencies. Furthermore, there were other background processes active on the system, which disturb the measurement and explain the observed variance.

 

In case of the Intel® Xeon® E5-4600 product family, each processor has only 2 Intel® QPI links. Because of the integrated PCIe controller, the QPI link to the I/O hub is not needed. Furthermore only two other processors are directly connected forming a ring topology:

 

Intel Xeon E5-4800 topology

 

There are therefore three different cases in terms of memory latency:

  1. The software is referencing memory that is directly attached to the socket where the process or thread is running.
  2. The software is referencing memory that is attached to a socket that is 1 QPI hop away
  3. The software is referencing memory that is attached to a socket that is 2 QPI hops away.

This is also nicely reflected when the latency is measured with  Intel® Memory Latency Checker

CPU

0

1

2

3

0

72

291

323

294

1

296

72

293

315

2

319

296

71

296

3

290

325

300

71

The memory latency on the same socket is about 70-75ns. If the memory is on a sibling processor, I measure about 290-300ns. The far sockets show a latency of about 320ns.

These numbers also show that the Intel® Xeon® processor E5-4600 product family is optimized for local memory access. If a core accesses memory on its own socket, the latency is much better than with the Intel® Xeon® processor E7-4800 product family. However, if you compare the remote latency, when accessing memory on a different socket, the situation is completely different. This becomes even more obvious when comparing the latencies in a combined chart (only showing the latencies for socket 0):

 

Intel Xeon E5 and E7 memory latencies

 

Which of the two systems is better for your software therefore largely depends on how good your software in using local memory versus remote memory. If a program is NUMA-aware it can largely benefit from the great local memory latency of an Intel® Xeon® E5-4600 processor. If it is difficult or impossible to ensure that the program uses mostly local memory, then an Intel® Xeon® processor E7-4800 might be the better choice.

For more complete information about compiler optimizations, see our Optimization Notice.