Problems encountered when I try to get the L1,L2 Cache Miss rate of an Intel Sandy Bridge.

Problems encountered when I try to get the L1,L2 Cache Miss rate of an Intel Sandy Bridge.

I've learned the following formula to calculate the L1,L2,L3 Miss rate from another post which is given by @Kirill Rogozhin (Intel):

L3 cache miss
(180 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS) / CPU_CLK_UNHALTED.THREAD

L2 cache miss
((26 * MEM_LOAD_UPOS_RETIRED.LLC_HIT_PS) + (43 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS) + (60 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS)) / CPU_CLK_UNHALTED.THREAD

L1 cache miss
((12 * MEM_LOAD_UOPS_RETIRED.L2_HIT) + (26 * MEM_LOAD_RETIRED.LLC_HIT_PS) + (43 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS) + (60 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS) + (180 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS)) / CPU_CLK_UNHALTED.THREAD

 However, there are 2 problems are encoutered during the process.

1.When I calculate the L3 miss rate, I get 90%. But my test application code is very simple. Therefore , the miss rate can't be that

   big. And when I calculate the L2 miss rate, the result is bigger than 1 which is obvious not correct.

2.When I use hardware event :MEM_LOAD_RETIRED.LLC_HIT_PS , it shows that it's a invalid event. But on the platform of Sandybridge,

this event should be valid. So, I've no idea what's happening.

Any help would be appreciated.

Sun.

5 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

>> L1 cache miss

>> ((12 * MEM_LOAD_UOPS_RETIRED.L2_HIT) + (26 * MEM_LOAD_RETIRED.LLC_HIT_PS) + (43 *  MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS) + (60 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS) + (180 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS)) / CPU_CLK_UNHALTED.THREAD

It might cause misunderstandings, and regard that L1 data cache miss is more expensive, traditionally L1 data miss means L2 hit. Above is penalty for all L1/L2/LLC miss.

Penalty is MEM_LOAD_UOPS_RETIRED.L2_HIT * 12 for L1 miss.  

>>1.When I calculate the L3 miss rate, I get 90%. But my test application code is very simple. Therefore , the miss rate can't be that big. And when I calculate the L2 miss rate, the result is bigger than 1 which is obvious not correct.

Please provide test case, and what processor you work on. If the sample is confidential - please go Intel Premier to consult.

>> 2.When I use hardware event :MEM_LOAD_RETIRED.LLC_HIT_PS , it shows that it's a invalid event. But on the platform of Sandybridge

Event MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS is supported in my SandyBridge processor. Note that different processor may have different event name for LLC HIT, use "amplxe-runss -event-list | grep LLC_HIT_PS" to check.

@Peter Wang (Intel)

Thanks for your help.

Now I see that the constants in the formula means the cycles needed to service the event.

But I actually just want to get the miss ratio of each level cache.

Can I just simply use the following formulas given by @Shannon Cepeda (Intel) to get the miss ratio ?

Demand Data L2 Miss Rate => 
(sum of all types of L2 demand data misses) / (sum of L2 demanded data requests) => 
(MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS + MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS) / (L2_RQSTS.ALL_DEMAND_DATA_RD)

Demand Data L3 Miss Rate => 
L3 demand data misses / (sum of all types of demand data L3 requests) => 
MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS / (MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS + MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS)

Demand Data L1 Miss Rate => cannot calculate.       

And I don't understant why can't we calculate the L1 data cache miss ratio? 

Thanks.

Sun

My opinion is to use formulas from your primary post. But,

L1 cache miss (hit in L2) is not worth to be written in guideline doc, because its penalty is tiny. 

If you want to evaluate L1 miss to impact on performance, use -

L1 cache miss ratio:

 MEM_LOAD_UOPS_RETIRED.L2_HIT * 12 / CPU_CLK_UNHALTED.THREAD,

Investigate if ratio > 0.2, but you know it never reach this threshold. that is why we can ignore them.

@Peter Wang (Intel)

Thanks a lot!

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen