# I'm still confused by how to calculate the L1, L2 cache miss ratio after reading many related posts.

## I'm still confused by how to calculate the L1, L2 cache miss ratio after reading many related posts.

I'm trying to use Vtune to get the L1I, L1D, and L2 cache miss ratio on the platform of  Intel Xeon core microarchitecture.

First of all, the miss ratio I'm trying to get is the one under traditional definition like L2 misses number/whole L2 requests,not the one that defined in the Intel manual to calculate the ratio of L2 misses number of whole instruction retired like  L2_LINE_MISS.SELF.ANY/INST_RETIRED.ANY.

Therefore, my question is:

1).When it comes to L1 cache miss ratio, I'm using the following formula by the meaning of the hardware events literally​:

I'm  useing this formula but I'm not sure whether it is correct or am I missing some other hardware events to be put into this formula.

2).As to the L2 miss ratio, I know that the difference between MEM_LOAD_RETIRED.L2_LINE_MISS and L2_LINE_MISS.SELF.ANY is that the latter includes the instruciton fetch misses. I want to get the whole L2 miss ratio including instruction prefetching. So I would like to use L2_LINE_MISS.SELF.ANY as the numerator and the sum of  L1D misses and the L1I misses as the denominator.

So the formula should be like this:

L2 cache miss ratio= L2 misses number / whole L2 requests=L2_LINE_MISS.SELF.ANY/(MEM_LOAD_RETIRED.L1D_LINE_MISS+L1I_MISSES)​

But here comes the question that when I use this formula to calculate the L2 miss ratio of a program in the Graphlab, the numerator is bigger than the denominator which means the miss ratio is bigger than 1 . Obviously it is incorrect.

So I realize that there are something wrong with the hardware events that I used in the formula and I suppose it would be the denominator.

I'm looking for the hardware events that could stand for the whole L2 requests but I got some events like L2_RQST.SELF.ANY.S_STATE,  L2_RQST.SELF.ANY.M_STATE,L2_RQST.SELF.ANY.I_STATE, L2_RQST.SELF.ANY.E_STATE, L2_RQST.SELF.ANY.MESI, L2_RQST.SELF.DEMAND.M_STATE, L2_RQST.SELF.DEMAND.S_STATE, L2_RQST.SELF.DEMAND.E_STATE, L2_RQST.SELF.DEMAND.I_STATE.

Those are events telling the L2 requests from different units or the accessed times of the cache lines under different states.

I've no idea should I use any of them in this formula and how ?

Any help would be appreciated.

Sun.

3 Beiträge / 0 neu
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

I suppose it's not coincidental that VTune General analysis quotes L1 hit rates but gives only raw numbers in the various categories of L2 miss.  Even those L1 hit rates don't always make sense when I fliter down to a single thread, while cache hit rates on idle threads don't have any meaning for me.

I suppose you're ending up counting repeated misses more heavily than repeated access requests.

Zitat:

TimP (Intel) schrieb:

I suppose it's not coincidental that VTune General analysis quotes L1 hit rates but gives only raw numbers in the various categories of L2 miss.  Even those L1 hit rates don't always make sense when I fliter down to a single thread, while cache hit rates on idle threads don't have any meaning for me.

I suppose you're ending up counting repeated misses more heavily than repeated access requests.

Right now, I'm thinking that the denominator is far less than the actual number maybe since that the  L1D misses just represent the misses from one core or several cores instead of all the cores.

And I see that there are 2 hardware events named L2_IFETCH.BOTH_CORES and L2_IFETCH.SELF which descriptions are "counts events initiated by either core" and "counts events initiated by this core only" individually.  I'm quiet confused by this 2 descriptions. What does that mean by "either core" and "this core only" since my processor have 8 cores.

I'm struggled in using Vtune to get the cache miss ratio for a month and still don't know the exact methods to get the correct answer.