Parallel Amplifier confusing hotspot results

Parallel Amplifier confusing hotspot results

I am trying to use Parallel Amplifier to evaluate performance of two different ways of building a particular application. Out in "real life" on a particular workload I can see that the proposed new way is faster on many workloads but slower on a few, and I am trying to evaluate one of those slower cases. It's not helping. I ran Hotspots Analysis using the VTune start/stop API to time only the actions of interest. In the results IPA shows certain functions as taking 50-80% of the bottom-up time. The specific function shown does not seem like it could possibly be that slow, and the identity of the function changes each time I rerun the workload on either one of the applications. What could be going wrong?

(I tried VTune first but for some reason it is not willing to generate any call stack data for the second executable, although it worked fine for the first one. The main difference between the two executables is that the second one uses msvcrt instead of libcmt.)

In case it matters, I am running IPA on Win7 Enterprise x64 on 64-bit executables; the VTune run was on a different machine with XP x64.

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
MrAnderson (Intel)'s picture

Perhaps you could paste a screenshot of what you are seeing?

Some observations, the VTune analyzer sampling feature does NOT collect call stacks. Only the call graph feature can show calling sequences, but because of the overhead, it is not reliable for measuring actual performance.

I would warn you against using the VTune start/stop API (is that what you really meant?) or the VTune Pause/Resume APIs, at least initially. What do the results look like if you just run Parallel Amplifier on the whole app? Do they look correct?

Do the two versions of the application have the same executable filename? Do you have separate configurations in Visual Studio for them so that they reside in separate directories?

Regards, MrAnderson

MrAnderson,

First just to be clear, I am comparing IPA hotspot analysis vs VTune call graph profiling. You're right, I use VTPause / VTResume, not start / stop. It's been so long since I submitted those calls to our source tree for use with VTune that I don't think about the function names anymore. But I was pleasantly surprised when I read in this forum (or a document linked from here) that the same functions are supposed to work with IPA, and they seem to work fine. The data which I see mostly does look like it's limited to the portion of the run which interests me. The main problem seems to be that one function gets way overemphasized in the results-- and which function that is changes in each run.

The two versions of the application have different filenames. In VTune I am running them from the (unsupported but essential) command line using different VTune project files. In IPA I am running them in the same Visual Studio project file and editing the executable name between runs. I don't see why this should have any effect on the sampling itself though.

I am not sure a screenshot will help. I can try and paste one later. Basically each identical run shows about 120 seconds of bottom-up time associated with some function-- which function varies in each identical run, even of the same executable-- and all the rest of the functions have very little bottom-up time.

Regards,
aap

MrAnderson (Intel)'s picture

So, are you running the latest version? Update 3 was posted a few weeks ago. That would be the first thing to ensure.

Next, if you would like us to check out your results, you could post a "private" reply and attach the zipped results directory (or directories) to the reply.

Regards, MrAnderson

Login to leave a comment.