I wrote up this problem report, and then after some more tinkering around, discovered I could avoid the problem by turning off calibration, adding
"-cal no -si 1"
to my vtl command line.
I thought I'd report the problem, in case anyone knows more about the problem, or in case anyone else is seeing a similar problem and wishing they had a workaround.
VTUNE 1.1 reports different CPU utilization than sar(8)
HW: 4-way Xeon (hyper-threaded 8-way) IBM440
OS: SLES 8, kernel 2.4.19-64GB-SMP
SW: VTune 1.1 for Linux
I've been comparing the scaling of a fabric
I/O-bound application when multiple processes
When I run one copy of the process, I run it
directly under vtl
vtl -d 60 -c sampling
When I run multiple copies of the process, I
start all but one instance from a tight shell
loop and put them in the background, and then
run the last one under vtl, as above.
In either case, I would also run sar 10 7 in the
background while the vtl was executing.
With 1 instance of :
- reports throughput of io/sec
- sar reports: 4% usr, 4% system, 92% idle
- vtune reports 70% of event samples in the
function default_idle in the vmlinux module
With 20 instances of :
- reports throughput of <3X> io/sec
- sar reports: 23% usr, 77% system, 0% idle
- vmtune reports 70% of event samples in the
function default_idle in the vmlinux module.
The vtl command I'm using to view the results is
vtl view aXX::r1 -ha -mn vmlinux
where XX are reported by view show as aXX_
Why would vtune report such a different cpu usage
I never did figure out if the fact that calibration was
being used was the problem, or if the fact that
calibration was being used caused the a time period
mismatched between vtl and sar. At any rate "-cal no"
now produces vtl idle reports similar to sar.