I am currently doing some performance tests on some offload code for Xeon Phi. I have been calculating performance numbers by measuring hardware counters using PAPI, with the calculation methods explained here:
However, in the memory bandwidth section (5.4), the guide says to use an event named HWP_L2MISS to count the number of hardware prefetches that missed L2, which is provided in VTune apparently - although it does not appear to be an actual event according to the list of available events for the PMU document here:
I assume it is some derived metric VTune works out for you - however I was wondering if anyone knows how it should be calculated? Could I add the number of prefetch0 and prefetch1 requests missed by L2 as provided by counters L2_DATA_PF1_MISS & L2_DATA_PF2_MISS or is there more to it?