Calculating prefetches that missed L2

Calculating prefetches that missed L2



I am currently doing some performance tests on some offload code for Xeon Phi. I have been calculating performance numbers by measuring hardware counters using PAPI, with the calculation methods explained here:


However, in the memory bandwidth section (5.4), the guide says to use an event named HWP_L2MISS to count the number of hardware prefetches that missed L2, which is provided in VTune apparently - although it does not appear to be an actual event according to the list of available events for the PMU document here:


I assume it is some derived metric VTune works out for you - however I was wondering if anyone knows how it should be calculated? Could I add the number of prefetch0 and prefetch1 requests missed by L2 as provided by counters L2_DATA_PF1_MISS & L2_DATA_PF2_MISS or is there more to it?





9 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hi Tim,

Let me ask the experts here and get back to you. Thank you.

Although they are not documented in the Intel Xeon Phi Performance Monitoring Units guide (document 327357-001), Intel's VTune includes performance monitor events that appear to be what you are looking for:

Event 0xC3, Umask 0x10: HWP_L2HIT : Hardware Prefetch L2 HIT
Event 0xC4, Umask 0x10: HWP_L2MISS : Hardware Prefetch L2 MISS

The VTune "knc_db.txt" file indicates that all of the events using Umask 0x10 should use counter 0 only, but I don't see that indicated anywhere in the documentation.

John D. McCalpin, PhD
"Dr. Bandwidth"

Hi Tim, 

HWP_L2MISS is an actual PMU event. I can see this in the list events in Intel VTune amplifier XE when I try to configure a custom analysis. 





Thanks for the assistance guys - especially the knc_db.txt file mentioned, i found that file in the VTune installation directory and it answered a fair few of my questions. 

Although I note that the event John mentioned:

Event 0xC3, Umask 0x10: HWP_L2HIT : Hardware Prefetch L2 HIT

Does not seem available in VTune, or appear in the knc_db txt file I have.


Just to note for anyone else, some of the events available in VTune are not available through PAPI (unlisted in PAPI_NATIVE_AVAIL)- for example:




Tim D. schrieb:

Does not seem available in VTune, or appear in the knc_db txt file I have.


It is available. Just add

-knob event-config=HWP_L2MISS:sa=1000003

too your vtune command line script.

Mostly I use the following command:

amplxe-cl -collect-with runsa-knc -knob event-config=BRANCHES:sa=1000003,BRANCHES_MISPREDICTED:sa=1000003,CPU_CLK_UNHALTED:sa=10000000,DATA_CACHE_LINES_WRITTEN_BACK:sa=1000003,DATA_PAGE_WALK:sa=1000003,EEC_STAGE_CYCLES:sa=10000000,HWP_L2MISS:sa=1000003,INSTRUCTIONS_EXECUTED:sa=10000000,L2_READ_HIT_E:sa=1000003,L2_READ_HIT_M:sa=1000003,L2_READ_HIT_S:sa=1000003,L2_RED_MISS:sa=1000003,L2_WRITE_HIT:sa=1000003,LONG_DATA_PAGE_WALK:sa=1000003,VPU_INSTRUCTIONS_EXECUTED:sa=1000003

I was referring to the event HWP_L2HIT mentioned by John rather than HWP_L2MISS, I am not actually concerned with monitoring HWP_L2HIT at the moment I was simply commenting that I did not see this event in the custom analysis event menu, nor in the knc_db file john referenced.


Thanks for the example of the command you use though, this is useful

I tried reading this HWP_L2HIT and HWP_L2MISS and it was showing "0" in all cores.  How shall I verify whether is it due to my HWP on/off?

That looks like the wrong event --- the HWP_L2_MISS event is Event 0xC4, not 0x03.

I definitely get non-zero counts for HWP_L2_HIT.  I am not sure if they make sense yet -- that will take a lot more experimenting....

John D. McCalpin, PhD
"Dr. Bandwidth"

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen