Combining VTune MPI output


I  am using VTune Amplifier XE to profile WRF MPI and hybrid performance. I use the following command to produce to outputs:

 ibrun -n 16 tacc_affinity amplxe-cl -collect hotspots -result-dir r001hs wrf.exe

This will give me 16 result directories. I was wondering if there is any way to combine all the output files to compare which subroutine/module  are the most expensive. Or if there is any way to just produce one output instead of many.

Unable to re-open results

I am running Amplifier 2015, update 4 on Windows 8.1.  I run an analysis on a Linux box.  It works perfectly and I am able to review the results.  After I close the results and try to open them again, I get the following error:


Intel VTune Amplifier XE 2015 has faced a serious problem
    6/10/2015 12:03:22 PM  Cannot open data: Intel VTune Amplifier XE 2015 has faced a serious problem. 
Error 0x40000016 (Corrupted product installation) 

Application Silently Stops

I am using VTune + Libittnotify to identify and profile some task regions in a few programs. However, I am seeing a weird behavior in some programs. What happens is that when I enable the "Analyze user tasks" in any analyze, the execution of the program is stopped "somewhere in the middle" and the analyze session continues and finishes as if nothing wrong had happened (no warning or error message appear anywhere). When I disable "Analyze user tasks" the program execute until completion and everything works just fine. 

Concurrency vs Advanced Hotspot Analysis

I have a parallel application and I am profiling it with VTune Amp. XE 2015 on Ubuntu 14.10. My goal at this moment is to check how much time (and in which way / pattern) the threads of the program are executing in parallel (i.e., I want to analyze the concurrency of the program). However, I am seeing some results that I do not understand.

It all has to do with: [set] thread affinity x concurrency analysis x advanced hotspot analysis. 

With thread affinity disabled and doing concurrency analysis in VTune I see the following concurrency histogram:

Problem loading of Advanced hotspot results into GUI with 2015.4.0.410668


Loading of Advanced hotspot results into GUI leads to an infinite loop of some sort and eventually pops up the dialog saying VTune is not responding. Most of the results seem to come up on the Summary page but the progress bar for "Collection and Platform Info" gets stuck at about 80%. amplxe-gui burns 100% CPU, apparently waiting for some mutex to be released, see strace below.

All other analysis types work ok. Displaying the results from the cli client works ok, e.g.:

amplxe: Error: [Instrumentation Engine]: failed to create the detach process (clone failed)

running command

amplxe-cl -target-process dsd -collect hotspots
amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /r000hs -command stop.

then stopped but got the following error:

^Camplxe: Error: [Instrumentation Engine]: failed to create the detach process (clone failed)
amplxe: Collection detached.
amplxe: Collection failed.
amplxe: Internal Error

Could anyone help? Thanks

Program is faster when launched from VTune


I've noticed that if I time my program launched from the command line, it is slower than when I launch it from VTune (~25sec vs ~18sec). I've already checked that I'm running the same binary with the same options. Also, when I run it from the command line I run it 5 times consecutively.

I've noticed that, when launched from command line, the CPU stays in "slow" mode (at 1600MHz measured by CPUz), but when launched from VTune the CPU enters "turbo mode" (goes up to 2400MHz measured by CPUz).

