L2 cache miss profiling on E7

L2 cache miss profiling on E7

I used Vtune to profile L2 cache miss of a java application on Xeon E7 (Westmere-EX A2). The counter I used is L2_RQST.LD_MISS. 

To find which address accessing causes the cache miss, I digged into the assembly code provided by vtune.

But Vtune shows that a lot of cache misses were happend at instrunctions which only have register operation.

For example, following is a part of the result from Vtune:

Assembly

Assembly                                          L2_RQSTS.LD_MISS      L2_RQSTS.LOADS    L2_RQSTS.MISS   L2_RQSTS.REFERENCES
Block 53:
mov r11d, dword ptr [r12+r10*8+0x34]    400,000                                                                        400,000
mov edi, dword ptr [r12+r11*8+0xc]       1,600,000                     400,000                  2,400,000        2,000,000
test edi, edi                                       17,200,000                    14,800,000             26,000,000       33,600,000
jz 0x7f6fb2b98a6d <Block 103>

2 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

This is nothing unusual. Samples are often off by a few instructions. Usually, cache misses on instruction without memory access turn out to belong to the predecessor.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui