Memory Bandwidth on 2 socket Xeon E5-2670 without uncore counters

Memory Bandwidth on 2 socket Xeon E5-2670 without uncore counters

Hi,

I need to measure memory bandwidth on a data-center where each node is a 2 socket Xeon E5-2670.

I know it can be measured with Uncore performance counters (iMC performance monitoring CAS_COUNT) as described in Intel Xeon E5-2600 Product Family Uncore Performance Monitoring Guide, but when I look in /sys/bus/event_source/devices/ there is no uncore counters... (I guess this is because it runs old Linux kernel 3.0, but unfortunately I cannot change this, nor I can be root).

I have also tried using perf with raw events (umask and event code i found in previous document -- Intel Xeon E5-2600 Product Family Uncore Performance Monitoring Guide), but I am not sure whether these readings are correct -- moreover I didn't find umask and event code for each memory channel, but just one.

1. Can anybody comment on this and show me hoe to get umask and event code for all channels?

2. Is there any formula for the measurement of memory bandwidth that uses only Core performance counters on Xeon E5-2670 (some formula that use LLC misses, prefetching , etc,).  

Thanks for your help,

 Darko

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It is not possible to obtain accurate counts of memory bandwidth using only the core performance counters on the Xeon E5-26xx family of processors.

You can count some of the contributors to the memory bandwidth, but not all of them.  In particular, there is no counter for L2 prefetches that miss in the L3 cache and no counter for writebacks from L3 to DRAM.   Unfortunately these two mechanisms are often the largest two contributors to the total DRAM traffic.

It might be possible to derive the DRAM traffic from the L3 CBo counters, but I have not tried to validate this.  If you are running an older Linux kernel you won't have support for these counters either, but since they are in MSRs rather than in PCI configuration space they are somewhat easier to access.  You still need root access to open the /dev/cpu/*/msr device driver interfaces, but with "rdmsr.c" and "wrmsr.c" from msrtools-1.2 that is pretty easy to understand.

"Dr. Bandwidth"

Hi John,

Thanks for you response.

I have few more questions. 

Despite not having Linux support for Uncore counters, when I use perf with raw event value (of Uncore CAS_COUNT) I read some values...Could you comment on this and do you think it is really uncore hardware counter value, or some generic value? (e.g. when I use PAPI counters, output of papi_native_avail gives me a list with CAS_COUNT included)

In "Intel Xeon E5-2600 Product Family Uncore Performance Monitoring Guide" there is only one counter CAS_COUNT with one event code and umask. It doesn't have umasks and event codes for several memory channels (4 channels). Do you know how could I find these values for each channel, so that I can use perf raw events.

Or, could you give me some advice on how to approximate (doesn't need to be exact value) DRAM traffic?

Thanks for your help,

 Darko

The uncore counters are accessed in a completely different way than the core performance counters, so you can't access them by programming "raw" performance counter events. 

I think that what you are doing in this case is programming a *core* performance counter using EventSelect and UnitMask values that should properly be applied to the PCI configuration space addresses for each of the four integrated memory controllers (IMCs).   

From the Xeon E5-2600 uncore monitoring guide, the CAS_COUNT event is 0x04, which is not a defined event for the Sandy Bridge processor family.  

It is not surprising that you get some values -- lots of counters are not documented because they are broken or obsolete.  Sometimes the event is a holdover from previous architectures -- in this case EventSelect 0x04 was an event called "SB_DRAIN.ANY" in the Nehalem and Westmere processors, but there were lots of changes in the numbering of events between those processors and the Sandy Bridge generation, so I would not automatically assume that it is the same event.

It might be possible to get a better estimate of the memory bandwidth by using the "offcore response" counter events.  These require programming an additional MSR (0x1A6 for EventSelect B7h and 0x1A7 for EventSelect BBh), which may not be supported by your version of Linux -- I don't remember exactly when support for these events was added.   According to Table 19-9 of Volume 3 of the Intel SW Developer's Guide, these counters can count L2 prefetch requests that miss the L3.   There are a number of options, but the one called "OFFCORE_RESPONSE.PF_L2_DATA_RD.LLC_MISS.LOCAL_DRAM_N" looks closest to what you want.

There is still no way to count L3 writebacks to memory, and the "offcore response" events can't help (since a writeback has no "response"), but you might be able to close the gap somewhat if the offcore response counter for L2 prefetches that miss in the LLC works correctly.

"Dr. Bandwidth"

Leave a Comment

Please sign in to add a comment. Not a member? Join today