Cache Miss Rates in Intel® VTune™ Amplifier

Published:01/28/2015   Last Updated:07/02/2018

Intel® VTune™ Amplifier has the ability to use Performance Monitoring Units (PMUs) on Intel CPUs to count hardware events and use these events to locate performance issues. The most common way to do this is through the General Exploration analysis type. One set of metrics within General Exploration is related to the memory subsystem and can be found in the Back-End Bound > Memory Bound section of the hierarchy. A common question we receive about memory metrics is "can I calculate cache hit and miss rates?". The General Exploration metrics do not include these rates for a very specific reason. The Top-Down characterization in General Exploration attempts to find the hardware bottleneck which is causing REAL performance issues. In VTune Amplifier we have abstracted away the actual cache miss counts and replaced them with L1/L3 and DRAM Bound metrics. We did this because cache misses may or may not actually affect performance. The complex, pipelined, superscalar Intel processors may be able to schedule instructions in such a way that all the time spent waiting on an L1 miss, for example, is actually not a performance issue because other instructions were able to execute while you wait. The L1/L3 and DRAM Bound metrics in VTune Amplifier actually count cycles while the CPU was STALLED waiting for cache misses. This represents a real performance impact. For information on using the metrics in VTune Amplifier, see the tuning guides.

Having said that, if you’re still interested in counting cache misses, you will need to create a custom VTune Amplifier analysis type to collect the events. The events may have slightly different names depending on your hardware, and not all may be available on all platforms. The events should have names similar to these:

MEM_LOAD_UOPS_RETIRED.L1_HIT

MEM_LOAD_UOPS_RETIRED.L2_HIT

MEM_LOAD_UOPS_RETIRED.LLC_HIT/MEM_LOAD_UOPS_RETIRED.L3_HIT

 

MEM_LOAD_UOPS_RETIRED.L1_MISS

MEM_LOAD_UOPS_RETIRED.L2_MISS

MEM_LOAD_UOPS_RETIRED.LLC_MISS/ MEM_LOAD_UOPS_RETIRED.L3_MISS

 

Read the descriptions of each event to determine what it counts and how you would like to use it. Be aware that cache misses and miss rates may provide a characteristic profile of your application, however they do not always correlate with performance issues.

 

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804