vmlinux reported as using 98% of CPU_CYCLES

vmlinux reported as using 98% of CPU_CYCLES

I am using Vtune 3.0 on RedHat 3, Update 5. The 2 processors are Itanium 2's.

I do:
vtl activity run1 -c sampling -o "-ec en=CPU_CYCLES" -d 20 -app ./a.out,"args" run
vtl view | more

The view says that Module vmlinux, Process pid0x0 is taking roughly 98% of the CPU cycles, even though my app is CPU intensive and runs 20 seconds.

When I took the "Tuning for the Intel Itanium 2 Microarchitecture", I didn't have this problem.

I do not have a Windows workstation connected to the Linux box, so I must use the vtl interface.

Any ideas?

Jim Fry
Hewlett-Packard

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Nice posting, JF.

Might be interesting to turn calibration on and see if the numbers change any. If memory serves that's adding -cal on or -calibration on in the options setting of the vtl command. Full info at

$ man sampling

Also, don't forget that you can use the plug in viewers on Itanium Linux even though you don't have a full GUI with wizards. To invoke the plug in viewers, just add the -gui option:

$ vtl view # show me my data as ASCII text

$ vtl view -gui # show me the same data in the graphical viewer

cheers
jdg

PS: meant to ask: is the OS you're using listed in the release notes? Their default location is:

/opt/intel/vtune/RELEASENOTES.htm

If you're on an unsupported OS or kernel, you might need to open a premier case and see what engineering has to say about your setup.

Message Edited by jdgallag on 10-27-2005 12:36 PM

Message Edited by jdgallag on 10-27-2005 12:37 PM

JD,

I tried adding -o "-cal yes", and the problem remained.

The calibration is a good idea (although I probably should set it myself, rather than auto). But the problem remains that vtl is still reporting the vmlinux is taking much more time than it could really be taking, and so the rest of the output is suspect.

Jim Fry

Supported OS? Supported kernel?

jdg

The release notes say Red Hat Enterprise Linux 3.0 is supported, which is what is on this Itanium 2 system. But the kernel is 2.4.21-32.EL, not 2.4.21-4.EL as is listed under the Itanium supported kernels.

Jim Fry

a couple of things...
are you sure the application is actually being run by vtune..you should see its output..(the reason I raise this is that the only timesI haveseen the kernel taking all the cycles is when I "fatfingered" a path, the name of the app..or some such..the path doesn't include the working directory with the required data files....and given my typing..this happens to me a lot..:-)

You can use the remote data collector and the vtune gui on an ia32 box..you don't need a windows box to display the data in the gui..

levinth,

Yes, I am sure the app is running. I get output, including reports of CPU time consumed.

It would not be easy for me to get access to an IA32 box on the network the Linux box is on.

I think that I'll report this problem through Premier support.

Jim Fry

It sounds to me like what you are seeing is that one of the processors is not being used during the run. Pid 0x0 is the idle process and, unless your app is multi-threaded, your app is probably consuming all of one processor, while the other is sitting in the idle process of the kernel. The missing 2% is the 2% that your app isn't using on the one processor. Using the graphical viewer, you should be able to separate the samples by processor (see CPU button) and make this determination.

Message Edited by DaveA on 11-02-2005 11:33 AM

Regards,
MrAnderson

DaveA,

I don't think what you are saying can be, because in that case, the sum of all the CPU percentages would approach 200%, not 100%, as I am seeing.

I can't do vtl view -gui. I think it is only supported on IA32 machines.

Jim Fry

Absolutely not true.

vtlec = IA32 Linux and EM64T Linux only

vtl view -gui = IA32 Linux, EM64T Linux, and Itanium Linux

1) does vtl view (without the gui option) give results?

2) if so, something is wrong with the X setup on the server, because those plug in viewers are not eclipsed-based, but they sure are X

cheers

jdg

jdg,

My mistake. I must have fat fingered something the last time I tried.

Now when I use the CPU button, I see there are a total of 26 billion events (CPU cycles) on CPU1 (all processes), but less than 2 billion on CPU 2. Why the disparity? IA64_INST_RETIRED-THIS is 65G vs 4G. It appears that the second CPU is undercounting.

Jim Fry

VTune doesn't really count things in a precise geiger counter kind of way, click click click. Its sampling methodology determines statistically relevent information with regard to CPU activity. The numbers you collect in a given experiment can vary for a variety of reasons, even though the data is statistically valid.

If you get a moment, try repeating your experiment and turn calibration on. Is there a report difference? Step two, manually alter the "sample after" value in your collector on the GUI. Compare and let us know what you see?

cheers

jdg

Message Edited by jdgallag on 11-10-2005 10:41 AM

jdg,

The reason I submitted this in the first place was because thedata were NOT statistically valid. I'm trying to figure out why.

When I turn calibration on, the difference between the 2 CPUs is now a factor of two, rather than 16. That is, numbers of cycle and instruction "events" are twice as big on the first CPU as the second.

I couldn't figure out how to change "sample after" in the GUI. Are you sure that is what you want?

Jim

Well said, Jim. Of course since VTune has supported sampling on multiple CPUs for years and years now, successfully, I have a fairly kneejerk reaction to trusting the results I see. (You are wise not to, but I just wanted you to understand my reasons for double triple checking.) Yes, you might be right, about there being a bug, but it seems still to me unlikely.

Don't worry about changing the sample after value for now, although you can do that from the CLI or the GUI.

In general and especially working with highly optimized code, your optimizing compiler can do things to the execution of the code you wrote that you may not expect. I've assumed this may be the case here.

HOWEVER, let's assume for now you're right, and there is a clear bug. I suggest you open a premier case describing the problem, and if you could pack up that project that shows what you're seeing (creating a .tb5 file)and attachthat tb5 file to your case.

Cheers
jdg

Message Edited by jdgallag on 11-11-2005 01:35 PM

Message Edited by jdgallag on 11-11-2005 01:35 PM

jdg,

I had already been using premier support, as I indicated I would earlier in the discussion thread.

They eventually asked me to try 8.0.

I just had 8.0 installed, and this problem seems to have gone away.

Thanks,
Jim

VERY interesting, and thanks for checking back to let the team know.

cheers

jdg

Leave a Comment

Please sign in to add a comment. Not a member? Join today