Getting call stack info in VTune when profiling native application on MIC

Getting call stack info in VTune when profiling native application on MIC

I apologize if this was answered somewhere else, but I couldn't find any answer in VTune tutorials on on this forum ...

I'm trying to profile native application running on Phi using VTune Amplifier. I'm following the suggestions in Hands-on Lab: Optimizing Monte Carlo on Intel Phi. I've compiled my application with flags "-g -shared-intel -shared-libgcc -debug inline-debug-info". In VTune project properties I've specified Application=ssh and Application Parameters=<name of the script on mic0 to execute>

Application runs fine, and VTune collects data, but in Bottom-up view I can't get the call stack information for my application (see attached screenshot). I'm using the latest VTune Amplifier XE2013 and Intel compiler v 13.1.3.

Any suggestions for getting Call Stack information in VTune or any other techniques I should use for profiling application on Phi?

Downloadimage/jpeg vtune-screenshot-mic.jpg316.39 KB
12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Last I checked (as of Intel VTune Amplifier XE 2013 Update 11), collecting call stack information on the coprocessor using Intel VTune Amplifier XE is not supported. I could submit a feature request for this to the developers on your behalf, if you would like that. However, with that said, I cannot make any promises as to if and when this feature will be implemented. 

Having call stack information available will be extremely helpful when profiling complex applications. Please submit this feature request. Thank you!

Hi Victor, 

I have filed a feature request for Intel VTune Amplifier XE: 6000025555

On further investigation, I found that ITAC can be used to trace any source code. Here are the basic steps to use ITAC to view the call stack:

- Install ITAC on the machine. ITAC is not available as a stand-alone package and is available as a part of the Intel Cluster Studio XE. 
- Source from the <install_dir>/itac/<version>/bin/ directory; this will set your environment
- Recompile your application using the –tcollect and –mmic switches: this will compile for the coprocessor in native mode and will link in the trace collector libraries
- Make sure the trace collector libs are available on the card
            o Mainly, the files under <install_dir>/itac/<version>/mic/slib/* should be copied to the card, under /lib64
- Now, run your application on the card. The trace collector will create a few files with the name <exe_name>.stf*
- Now, those files will be created in the same location where you ran your executable; go ahead and transfer them over to the host
- You can view those *.stf* files using the GUI, by typing “traceanalyzer <exe_name>.stf”; this will try to open an x-application so make sure your display is set correctly
- Once the GUI is open, on the front page, you’ll see a blue area called “Application”; just right-click and select “Ungroup Application” and that will show you your routines. 

I hope this helps.

Thanks! Can I get ITAC if I have Intel CPP Studio XE license ?

ITAC is provided with the Intel Cluster Studio XE 2013 (the latest version available here if you want to get it):

This document should help you visualize how Intel packages the various products:

As you will be able to see,  Intel Parallel Sudio XE license does not cover the ITAC installation.  

I was able to get and install ITAC on my host. But now I'm running into linker errors (command "icpc .....-mmic -pthread -tcollect):
x86_64-k1om-linux-ld: skipping incompatible /opt/intel/itac/ when searching for -lVT
x86_64-k1om-linux-ld: cannot find -lVT
x86_64-k1om-linux-ld: skipping incompatible /opt/intel/itac/ when searching for -ldwarf
x86_64-k1om-linux-ld: cannot find -ldwarf
x86_64-k1om-linux-ld: skipping incompatible /opt/intel/itac/ when searching for -lelf
x86_64-k1om-linux-ld: skipping incompatible /opt/intel/itac/ when searching for -lvtunwind
x86_64-k1om-linux-ld: cannot find -lvtunwind

P.S. to my earlier comment....

Since there is no file "<install_dir>/itac/<version>/bin/" on my host as per instructions above, I've run command "source <install_dir>/itac/<version>/intel64/bin/" instead.

Hi Victor, 

sorry for this delated response. To compile the application correctly, use the following compile line: 

mpiicpc -tcollect=VTcs -mmic hello.cpp 

The above command line uses VTcs instead of the default VT. This is the library designed to work with non-MPI programs. You will also need to manually initialize and finalize the collection, which normally happens in MPI_Init and MPI_Finalize. To do this, you need to add a call to 

To Initialize call:

int VT_initialize (int * argc, char *** argv)

To Finalize call:

 int VT_finalize(void)

For the most complete collection, I recommend putting these at the very beginning and very end of the program. You will need to include VT.h, wherever you use these calls. 

Lastly, you will need to source the mic version of the tools. Hence, you will need to source <install_dir>/itac/<version>/mic/ 

Let me know if you have any more questions.


Thank you for instructions. I was able to compile my program without errors. I got it running on Phi (albeit it was running extremely slow). But in the middle of run the program terminated unexpectedly with the following messages:

[0] Intel(R) Trace Collector INFO: 46.06MB trace data in RAM + 454.00MB trace data flushed = 500.06MB total
[0] Intel(R) Trace Collector INFO: 26.88MB trace data in RAM + 973.25MB trace data flushed = 1000.12MB total
[0] Intel(R) Trace Collector INFO: 8.06MB trace data in RAM + 1492.12MB trace data flushed = 1500.19MB total
[0] Intel(R) Trace Collector INFO: 54.25MB trace data in RAM + 1946.00MB trace data flushed = 2000.25MB total

And it didn't produce any *.stf files. My executable runs fine when it's compiled without Trace Analyzer.

Hi Victor, 

Is the trace file being written to a shared filesystem or the coprocessor's filesystem? It is possible that the available space is being filled up. The flushed data is by default written to /tmp, which is normally not a shared filesystem. You can change this by setting VT_FLUSH_PREFIX to point to a shared filesystem.

Hi Sumedh,

No files appeared on Phi local filesystem under /tmp or under /root. It is possible that Phi run out of memory. My program uses almost whole available RAM when it runs. How large *.stf files should I expect?

Leave a Comment

Please sign in to add a comment. Not a member? Join today