I wrote up this problem report, and then after some more tinkering around, discovered I could avoid the problem by turning off calibration, adding
"-cal no -si 1"
to my vtl command line.

I thought I'd report the problem, in case anyone knows more about the problem, or in case anyone else is seeing a similar problem and wishing they had a workaround.


VTUNE 1.1 reports different CPU utilization than sar(8)

HW: 4-way Xeon (hyper-threaded 8-way) IBM440
OS: SLES 8, kernel 2.4.19-64GB-SMP
SW: VTune 1.1 for Linux

I've been comparing the scaling of a fabric
I/O-bound application when multiple processes
are executed.

When I run one copy of the process, I run it
directly under vtl
vtl -d 60 -c sampling
-app ,""

When I run multiple copies of the process, I
start all but one instance from a tight shell
loop and put them in the background, and then
run the last one under vtl, as above.

In either case, I would also run sar 10 7 in the
background while the vtl was executing.

With 1 instance of :
- reports throughput of io/sec
- sar reports: 4% usr, 4% system, 92% idle
- vtune reports 70% of event samples in the
function default_idle in the vmlinux module

With 20 instances of :
- reports throughput of <3X> io/sec
- sar reports: 23% usr, 77% system, 0% idle
- vmtune reports 70% of event samples in the
function default_idle in the vmlinux module.

The vtl command I'm using to view the results is
vtl view aXX::r1 -ha -mn vmlinux
-sd /usr/src/linux
where XX are reported by view show as aXX_

Why would vtune report such a different cpu usage
than sar?

I never did figure out if the fact that calibration was
being used was the problem, or if the fact that
calibration was being used caused the a time period
mismatched between vtl and sar. At any rate "-cal no"
now produces vtl idle reports similar to sar.

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hey mwittle,

Very thorough: thanks for posting such an interesting experiment and results.

In a nutshell: VTune doesn't count anything, like for example a geiger counter keeps precise count of hits: tick tick tick tick tick.

VTune tracks statistically significant information about processor events which actually occurred, and it uses a sampling technology to do so.

Aside from the fact VTune doesn't keep an exact count, it does a pretty darn good job of pointing you to where the processor was spending most of its time during your sampling session.

And you'd expect differences from more than just sar. Try this experiment: run a 20-second vtl session launching no application (leave -app out, keep it simple), and at the same time run a "ps -ef" and then "ps -ef| wc -l".

The ps command is showing you an exact count, everything that is listed in the process run table whether the cpu is running that process at the time or not. VTune is only going to show you statistically relevant information about events that the processor actually ran during that same time period.

The lists are showing two different things, on purpose and by design. And this model applies directly to other commands such as sar, which count precisely (as opposed to "sample").

That said, we would never expect VTune to count events in a precise way, it's just not designed to do that. Does it still work like a champ? You bet.

Also, please note that the engineering team is very seriously considering turning the calibration default to OFF instead of ON in the next release of vtl (2.0, currently in beta), since it does seem to occasionally cause confusion, and can always be turned back on when needed.

Let me know if this helps.



Leave a Comment

Please sign in to add a comment. Not a member? Join today