Difference between Sandy Bridge LLC miss events

Difference between Sandy Bridge LLC miss events

Wilson R.'s picture

On a Sandy Bridge processor I'm trying to find an answer for what the difference is between the following two events.  This is based on the information in the developer manual volume 3b part 2 insections 19.1 and 19.3.

  • The architecural performance event "LLC Misses" which is also called LONGEST_LAT_CACHE.MISS Event Num: 0x2E Umask Value: 0x41
  • MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS Event Num: 0xD4 Umask Value: 0x02

I have seen one post that stated the LLC Misses counts LLC misses due to loads and stores, but not LLC misses due to hardware prefetches.
Could someone help explain the difference between these two events?

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
perfwise's picture

Wilson,

    The 0x2E only count demand requests and misses to the L3, so those requests which originated from an instruction's load, and possibly store.  The other mask sounds like it counts the # of ops that missed, but I suspect it's less useful.  You really don't know from these PMCs the activity that's going on in SB/IB.  For instance, you don't know the L2 hardware prefetches which hit or missed, the LLC prefetches which were made (also originating in the L2) and you don't know the write backs of modified data.

    To get a better understanding of the L3 I suggest you use PMC 0x34.  This allows you to measure each Cache Block in the SB/IV L3 and track read/write requests which hit or miss.  Also 0xB0 is very useful to getting an idea of the breakdown of the "types" of requests to the L3.

    I don't work for Intel, but have learned this through alot of trial and error.. hope it helps.  I wish there was better documentation and assistance from Intel on their PMCs.  The documentation is poor, or they don't work and it's up to others like us to determine what works or doesn't.  

Perfwise 

iliyapolak's picture

 >>>I wish there was better documentation and assistance from Intel on their PMCs.  The documentation is poor, or they don't work and it's up to others like us to determine what works or doesn't. >>>

Poor documentation could be done intentionaly in order to not expose to general public some of the processor features.

perfwise's picture

Absolutely, but many of the counters I've programmed the results don't make sense or don't work.  Also, some counters work in one revision and don't in others.  I'm just voicing the point that it would be quite customer centric to have some decent backwards support and understanding of what's going on.  For example, in my SB I can't even measure memory bandwidth.  Now I'm sure there was support for that for someone who built the chip.  Also these L3 and hw pref stats from the DC are very misleading.  The question above highlights the difficulties with such limited documentation.

iliyapolak's picture

I suppose that more in depth information is accessable for some Intel's partners like Microsoft and other software companies.You stated in your post that sometimes the results do not make any sense can you rule out the possiblity of programming error?.Regarding poor documentation I think that it could be called "some functionality and features are obscured intentionaly by design".We are simply not given the full finite  state machine representation of the PMU counters implementation and this can lead to unexpected behaviour and strange results.

iliyapolak's picture

@perfwise

Are you programming PMU counters under Linux or Windows.If under Windows how do you display your data?

Wilson R.'s picture

Thank you Perfwise for sharing the information.  Based on your answer that could explain why the number of events for 0xD4 is much larger than 0x2E. 

-Wilson

 

Login to leave a comment.