Independent Channel vs. Lockstep Mode – Drive your Memory Faster or Safer

The latest Intel® Xeon® Processor E7 v2 Family lets you make an interesting choice. If you are willing to give up some of the high-availability features, you can further increase the already outstanding memory bandwidth. In this blog, I’ll explain how this works and why this is the case, and how you can watch this effect on your system using the Intel Performance Counter Monitor

A processor in the latest Intel® Xeon® Processor E7 v2 Family comes with 4 Intel® Scalable Memory Interconnect 2 (Intel® SMI2) links. Each Intel® SMI2 link is then connected to a Intel® C102/C104 Scalable Memory Buffer, which in return provides two memory channels. Since up to three DIMMs per channel are supported, a system with 4 sockets can support up to 3*2*4*4 = 96 DIMMs.:

Lockstep Mode

The scalable memory buffer has the possibility to distribute cache lines between the two channels. Half of a cache line is than located on a DIMM of one channel and the other half is located on the other channel. In particular, only one memory channel is driving an Intel®SMI2 link, as opposed to two in independent channel mode. Furthermore, the Intel® SMI2 link operates at the same frequency as the memory.

Running the memory channels in lockstep mode has the advantage that you can apply an interesting trick to increase the system availability. Normally, you have 16 memory devices on a DIMM plus 1 device for CRC and 1 device for parity. If one if the devices fails, its data can be reconstructed. This is called single-device data correction (SDDC).

DIMM with 18 devices (16 data, 1 CRC, 1 Parity)

For double-device data correction (DDDC), you combine these 2 devices from 2 DIMMs, i.e. 4 devices per pair of DIMMs. This results in 32 “data” devices, 2 devices for CRC, 1 device for parity, and 1 spare device. If one of the devices fails, the spare device can replace this device. After the failure of one device, you still have the benefit of SDDC. In summary, DDDC allows recovery from two sequential DRAM failures on the memory DIMMs, as well as recovery from a subsequent single-bit soft error on the DIMM.

Independent Channel Mode

A new feature of the Intel® Xeon® Processor E7 v2 is the ability to run the Intel® SMI2 link at twice the frequency as the memory channels. It is therefore possible that each memory channel has its own memory controller and operators the memory channel at full speed. The Intel® SMI2 link interleaves the data from the two channels, which is then separated again by the scalable memory buffer.

Interestingly, you can also observe this when you run pcm-memory from the Intel Performance Counter Monitor package. The program pcm-memory allows you to display the memory traffic per memory channel. In case of lock-step mode, pcm-memory consequently displays only 4 memory channels per socket instead of 8 memory channels in independent channel mode.

---------------------------------------||---------------------------------------
--   Memory Performance Monitoring   --||--   Memory Performance Monitoring   --
---------------------------------------||---------------------------------------
--  Mem Ch 0: Reads (MB/s):   26.17  --||--  Mem Ch 0: Reads (MB/s):   24.85  --
--            Writes(MB/s):   24.49  --||--            Writes(MB/s):   24.09  --
--  Mem Ch 2: Reads (MB/s):    4.91  --||--  Mem Ch 2: Reads (MB/s):    2.18  --
--            Writes(MB/s):    2.37  --||--            Writes(MB/s):    1.39  --
--  Mem Ch 4: Reads (MB/s):   25.33  --||--  Mem Ch 4: Reads (MB/s):   22.79  --
--            Writes(MB/s):   24.28  --||--            Writes(MB/s):   22.77  --
--  Mem Ch 6: Reads (MB/s):    3.14  --||--  Mem Ch 6: Reads (MB/s):    2.09  --
--            Writes(MB/s):    1.66  --||--            Writes(MB/s):    1.34  --
-- NODE0 Mem Read (MB/s):     59.55  --||-- NODE1 Mem Read (MB/s):     51.91  --
-- NODE0 Mem Write (MB/s):    52.81  --||-- NODE1 Mem Write (MB/s):    49.58  --
-- NODE0 P. Write (T/s) :    711390  --||-- NODE1 P. Write (T/s):     711008  --
-- NODE0 Memory (MB/s):      112.36  --||-- NODE1 Memory (MB/s):      101.49  --
---------------------------------------||---------------------------------------
--             Socket 2              --||--             Socket 3              --
---------------------------------------||---------------------------------------
---------------------------------------||---------------------------------------
---------------------------------------||---------------------------------------
--   Memory Performance Monitoring   --||--   Memory Performance Monitoring   --
---------------------------------------||---------------------------------------
--  Mem Ch 0: Reads (MB/s):  112.24  --||--  Mem Ch 0: Reads (MB/s):   24.94  --
--            Writes(MB/s):   24.07  --||--            Writes(MB/s):   24.11  --
--  Mem Ch 2: Reads (MB/s):   89.65  --||--  Mem Ch 2: Reads (MB/s):    2.25  --
--            Writes(MB/s):    1.44  --||--            Writes(MB/s):    1.38  --
--  Mem Ch 4: Reads (MB/s):  110.17  --||--  Mem Ch 4: Reads (MB/s):   22.82  --
--            Writes(MB/s):   22.74  --||--            Writes(MB/s):   22.79  --
--  Mem Ch 6: Reads (MB/s):   89.60  --||--  Mem Ch 6: Reads (MB/s):    2.18  --
--            Writes(MB/s):    1.42  --||--            Writes(MB/s):    1.35  --
-- NODE2 Mem Read (MB/s):    401.66  --||-- NODE3 Mem Read (MB/s):     52.18  --
-- NODE2 Mem Write (MB/s):    49.67  --||-- NODE3 Mem Write (MB/s):    49.63  --
-- NODE2 P. Write (T/s) :    711031  --||-- NODE3 P. Write (T/s):     711011  --
-- NODE2 Memory (MB/s):      451.33  --||-- NODE3 Memory (MB/s):      101.81  --
---------------------------------------||---------------------------------------
--                   System Read Throughput(MB/s):    565.30                  --
--                  System Write Throughput(MB/s):    201.68                  --
--                 System Memory Throughput(MB/s):    766.98                  --
---------------------------------------||---------------------------------------

This system runs in lockstep mode. Therefore only the even channels 0, 2, 4, and 6 are used.

For more complete information about compiler optimizations, see our Optimization Notice.