CPU_CLK_UNHALTED for multithreaded application.

CPU_CLK_UNHALTED for multithreaded application.

I have the following doubt,

When we collect hw-events for a multi-threaded program in xeon-phi, the statistics for every hw-event is given on cumulative basis or thread basis? for example, CPU_CLK_UNHALTED like the cpu time (when using linux 'time') gives a cumulative clock cycles utilized by the application on defined number of threads unlike the elapsed time. Is this correct?

How are cache_fill and other hw-events reported? Is it an accumulation of all core events or just one core specified in the -collect cpu-mask in general?

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.


According to the Software Developer's Guide, there are 2 PMU counters per thread. Hence, I believe that the PMU counts are measured per thread. However, let me confirm this with a VTune expert and get back to you. 

Also, the value reported to you could be cumulative or per thread depending on the type of analysis, report and the grouping selected. You can always select the 'Core/thread/function/call stack' grouping to find the count values for individual threads as opposed to cores. 

I hope that answers your question. 

Duplicate of http://software.intel.com/en-us/forums/topic/499800 (has different responses).

Leave a Comment

Please sign in to add a comment. Not a member? Join today