Recently, when I was creating tutorial videos on Intel VTune Amplifier XE, I came across something that didn’t quite make sense. I was looking at the hardware event counts for a workload (shown below).
What puzzled me was how Intel VTune Amplifier estimated the hardware event counts from the event sample counts. If you don’t quite understand what I am talking about then you should probably read this (http://software.intel.com/en-us/blogs/2013/05/29/sanity-check-statistical-data-validity-of-intel-vtune-amplifier-xe-results) blog which throws light on the inner workings of the Intel VTune analyzer. Coming back to the topic at hand, when I first saw my results, I was expecting that the hardware event counts would be calculated using the formula:
Hardware event counts = event sample counts * events per sample (a.k.a. Sample After Value)
However, from the above image, it is clear that this is not the case. On asking one of the Intel VTune Amplifier developers, the mysterious workings of Intel VTune Amplifier were revealed unto me. The formula that Intel VTune Amplifier uses to estimate the number of hardware events is decided based on whether you choose to multiplex the hardware events or choose to do multiple runs. If you choose to do multiple runs then the formula above is absolutely correct.
However, if you choose to multiplex your hardware events then things are a little different. When profiling applications on the Intel® Xeon Phi™ coprocessor using hardware event multiplexing, Intel VTune Amplifier internally creates multiplexing groups with at most two events per group and tries to sample these events in a round robin fashion. The number of events in each group is limited to two events because the Performance Monitoring Unit (PMU) in the Intel Xeon Phi coprocessor can monitor at most two events at any given point. In my case, since I had 14 events, Intel VTune Amplifier internally created 7 multiplexing groups. When the hardware event collection completes and it is time to estimate the total number of events based on the sampling counts, Intel VTune Amplifier uses the following formula:
Hardware event counts = event sample counts * events per sample * number of multiplexing groups.
Intel VTune Amplifier adds a third term (number of multiplexing groups) to the equation to account for the time when other events are being sampled by the PMU. However this is again under the assumption that the application’s behavior is in a steady state throughout the sampling period. And this is how Intel VTune Amplifier XE estimates the number of hardware events from the sample event count.