L2 cache miss profiling on E7

L2 cache miss profiling on E7

I used Vtune to profile L2 cache miss of a java application on Xeon E7 (Westmere-EX A2). The counter I used is L2_RQST.LD_MISS. 

To find which address accessing causes the cache miss, I digged into the assembly code provided by vtune.

But Vtune shows that a lot of cache misses were happend at instrunctions which only have register operation.

For example, following is a part of the result from Vtune:


Assembly                                          L2_RQSTS.LD_MISS      L2_RQSTS.LOADS    L2_RQSTS.MISS   L2_RQSTS.REFERENCES
Block 53:
mov r11d, dword ptr [r12+r10*8+0x34]    400,000                                                                        400,000
mov edi, dword ptr [r12+r11*8+0xc]       1,600,000                     400,000                  2,400,000        2,000,000
test edi, edi                                       17,200,000                    14,800,000             26,000,000       33,600,000
jz 0x7f6fb2b98a6d <Block 103>

2 帖子 / 0 全新

This is nothing unusual. Samples are often off by a few instructions. Usually, cache misses on instruction without memory access turn out to belong to the predecessor.