Confusing results for L2_DATA_READ_MISS_MEM_FILL from vtune

Confusing results for L2_DATA_READ_MISS_MEM_FILL from vtune

Dear all,

In trying to profile the cache performance of an application and noticed something strange in the Vtune results.

vpshufd instructions seem to have positive values for L2_DATA_READ_MISS_MEM_FILL when the source and destination operands are registers.

Address    Source Line    Assembly    L2_DATA_READ_MISS_MEM_FILL  CPU_CLK_UNHALTED 
0x407afa    367    vpshufd $0x44, %zmm26, %k0, %zmm27    1,600,000    24,000,036   

I noticed this statement about this event in the KNC PMU events reference "Can include promoted read misses that started as CODE accesses"

Is this likely to be the reason for this?  If so, what does it actually mean?

Best regards,

Alastair

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Did you look for memory data dependencies, including these registers?

I have not looked at this on Xeon Phi, but on most processors there is often a bit of skew between the instruction that caused a performance counter overflow and the instruction identified by the interrupt.   I think there is a good chance that the memory reference that incremented the L2_DATA_READ_MISS_MEM_FILL counter event is one or a few instructions upstream of the VPSHUFD instruction.
 

John D. McCalpin, PhD
"Dr. Bandwidth"

Quote:

Tim Prince wrote:

Did you look for memory data dependencies, including these registers?

Hi Tim,

Thanks for your response.  The zmm register in question is loaded from an _mm512_mask_i32logather_pd intrinsic. 

Does this mean that the L2 miss might originate from there?  Would those misses not show up assigned to the gather?

Best regards,

Alastair

Quote:

John D. McCalpin wrote:

I have not looked at this on Xeon Phi, but on most processors there is often a bit of skew between the instruction that caused a performance counter overflow and the instruction identified by the interrupt.   I think there is a good chance that the memory reference that incremented the L2_DATA_READ_MISS_MEM_FILL counter event is one or a few instructions upstream of the VPSHUFD instruction.

 

 

Hi John,

Thanks for replying.  That is a good point, I will go back and look at the code to see if there are any likely candidates.  I mentioned in my other reply this register is loaded from a gather so I will see if that is nearby.

Best regards,

Alastair

Leave a Comment

Please sign in to add a comment. Not a member? Join today