Is 'Collect Highly Accurate CPU Time' Option Required?

You may run into the following warning when using the VTune Amplifier XE for Windows*:

Warning.png

This warning will only ever appear when running the tool on Windows operating systems.  You may wonder what is the implication of not using this capability?  This article will attempt to answer that question.

First, let's review the following excerpt from the Amplifier XE documentation:

By default, the Intel® VTuneTM Amplifier XE detects CPU time based on the OS scheduler tick granularity. As a result, the CPU time values may be inaccurate for targets that execute in short quanta less than the OS scheduler tick interval (for example, frame-by-frame computation in video decoders).

For the analysis types using the user-mode sampling and tracing collection (Hotspots , Concurrency , and Locks and Waits), you can choose to enable more accurate collection of CPU time information from the Analysis Type configuration window, which is available when you click the New Analysis button on the toolbar.


What the documentation doesn’t tell you is how it collects “more accurate" CPU time information.  It does this by using Windows’ ETW (Event Tracing for Windows) capability.  For example, without ETW, a sample is taken every 10ms.  For each sample, the OS is queried for the amount of time the thread executed and the difference is calculated between the samples, resulting in the delta. The information returned by the OS via this mechanism has a coarse granularity. The deltas are then totaled and displayed in the user interface of the Amplifier XE.  However, with ETW enabled, Amplifier XE can filter out any time spent executing other threads and accurately calculate time for monitored threads within each 10ms sample based on the context switch information acquired from ETW!  Based on this additional information, the “CPU time” calculated for the function/thread will be more accurate.

Does it mean the data is useless without ETW? No. It really depends on what is executing on the system during data collection and the structure of your application. In specific cases, we have observed about a 3% variation between “normal” and “highly accurate” CPU time.

But, there are corner cases where the difference could be as high as 30% or 40%.  If the thread is executing, but happens to be inactive every 10ms that a sample is taken without ETW, the results would grossly misrepresent the execution time.  Or, if the thread is mostly inactive, but runs exactly on the frequency of the 10ms samples, it may appear to consume large amounts of time, when in reality it does not.

The best thing to do is to test it yourself, if possible.  That is, collect Hotspots data with and without this option on and compare the resulting data.  This will tell you if running without "Highly accurate CPU time" produces results accurate enough to direct your optimization efforts, or if you need to have Administrative privileges so that you can enable this option.

However, if you are restricted from using highly accurate CPU time because of your corporation’s policies, you can, in general, be confident that analysis of your application’s performance is valid using “normal” Hotspots data collection.

Please post any comments or questions regarding this article to the VTune™ Amplifier XE forum.
For more complete information about compiler optimizations, see our Optimization Notice.