pcm and memory bandwidth

pcm and memory bandwidth

When looking at the intelpcm (rel 2.4) source, I see that PCM counters on memory bandwidth are not available for Windows and Sandy bridge/Ivy bridge architectures. Can someone recommend a non-invasive approach to measure those ?

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

>>>Can someone recommend a non-invasive approach to measure those>>>

Whast do you mean by non-invasive approach?

Below is a reply to a similar question on this forum. You can use VTune Amplifier to measure the bandwidth on sandybridge and ivybridge.

VTune also has the MMIO bandwidth counters for linux and windows (which PCM added in v2.4 for just linux).

On sandybridge, you can use the uncore events:
UNC_ARB_TRK_REQUESTS.WRITES # works for rfo (read for ownership) and nontemporal stores. evt num 0x81, umask 0x20, uncore unit=ARB
UNC_ARB_TRK_REQUESTS.EVICTIONS # works for wriiteback, evt num 0x81, umask= 0x80, uncore unit= ARB
UNC_CBO_CACHE_LOOKUP.ANY_I # works for reads and rfo and nontemporal stores, evt num 0x34, umask 0x88, uncore unit= cbox

These count full cache line transfers (so the number of bytes moved is 64 * event count).

There is one 1 CBOX unit per core so you can get the memory reads per core.
There is only 1 ARB unit per processor so you don't get the writebacks per core... just a total for the processor.
The formula would be total memory bw due to the cores is =
64 * (UNC_ARB_TRK_REQUESTS.EVICTIONS + UNC_CBO_CACHE_LOOKUP.ANY_I ) / elapsed_time

I have verified that the uncore events below are in ivybridge as well.

UNC_ARB_TRK_REQUESTS.WRITES # works for rfo (read for ownership) and nontemporal stores. evt num 0x81, umask 0x20, uncore unit=ARB UNC_ARB_TRK_REQUESTS.EVICTIONS # works for wriiteback, evt num 0x81, umask= 0x80, uncore unit= ARB UNC_CBO_CACHE_LOOKUP.ANY_I # works for reads and rfo and nontemporal stores, evt num 0x34, umask 0x88, uncore unit= cbox

These count full cache line transfers (so the number of bytes moved is 64 * event count).

There is one 1 CBOX unit per core so you can get the memory reads per core. There is only 1 ARB unit per processor so you don't get the writebacks per core... just a total for the processor. The formula would be total memory bw due to the cores is = 64 * (UNC_ARB_TRK_REQUESTS.EVICTIONS + UNC_CBO_CACHE_LOOKUP.ANY_I ) / elapsed_time

Top

 

Hi,

Intel PCM V2.8 now supports memory bandwidth metrics on your processor also in Windows (via winpmem driver).

Best regards,

Roman

Leave a Comment

Please sign in to add a comment. Not a member? Join today