Intel® Parallel Amplifier

“memory bound” metric in Vtune

Hello,

I am analyzing “memory bound” metric in my code with Vtune. According to "Intel® 64 and IA-32 Architectures Optimization Reference Manual-B.3.2.3":

%L2 Bound =(CYCLE_ACTIVITY.STALLS_L1D_PENDING - CYCLE_ACTIVITY.STALLS_L2_PENDING)/CLOCKS

But in my Vtune results, CYCLE_ACTIVITY.STALLS_L1D_PENDING is smaller than CYCLE_ACTIVITY.STALLS_L2_PENDING, why?

ittnotify in xeon phi offload regions

I'm trying to call the functions __itt_pause() and __itt_resume() from within offload regions so that I can only sample specific sections of my code in a knc-bandwidth collection. However I'm having problems with linking that I can't figure out. I am using Intel(R) VTune(TM) Amplifier XE 2015 Update 1.

My code looks like:

#pragma offload_attribute(push,target(mic))
#include <ittnotify.h>
#pragma offload_attribute(pop)

...

#pragma offload target(mic:offload_target)
{
    __itt_resume();
}

My make options are:

A 'Failed to create sampling data base' Problem

I am newer to VTune, and using VTune performance analyzer v9.1 on Windows XP(Intel Pentium G3420).

When I try to log Clockticks information, the VTune always show erro that "Failed to create sampling data base. probably .tb5 files are corrupted or don't exist".

When using "Quick performace analysis wizard", the "Clockticks" column always be zero('0') while other columns seem normal.

By the way, the programma to be analysied is development with Visual Studio 2008, MFC.

Can any one help me with this.

Thank u.

 

 

 

 

 

Centos7 kernel oops when running

When evaluating the vtune_amplifier_xe_2015.1.0.367959 on Linux I experienced a kernel oops in the vtune kernel modules. I was trying to run the microarchitecture -> general exploration -> bandwidth test. Centos 7 x86 default install updated with all patches. Code was running on SNB machine with the vtune CLI_install installed as per manual.

sfdump5 tool in VTune

There seems to be some mention of a command-line sfdump5 tool that can be used to process/view the samples within a .tb6 file. There is also documentation of this tool in the 3.11 revision of the SEP User Guide.

However, I can't seem to find this tool in the latest VTune Amplifier XE installation - C:\Program Files\Intel\VTune Amplifier XE 2015\bin32.

Has this sfdump5 tool been deprecated?

insufficient virtual memory

I am getting an error while loading a result from a general exploration experiment.  This is an MPI executable, but collection was only done for one process on each node.  See attached PNG for the error message.  Any ideas?

limit
cputime      unlimited
filesize     unlimited
datasize     4096000 kbytes
stacksize    7340032 kbytes
coredumpsize unlimited
memoryuse    1024000 kbytes
vmemoryuse   unlimited
descriptors  65536
memorylocked unlimited
maxproc      600

Call stack mechanism implementation question

I am running a Go program with dwarf information and VTune does a good job figuring out line numbers and so forth but it struggles with stack walks. I am guessing that it is because Go's stack conventions, how Go uses EBP for example, are different than those supported by Vtune. Is there a document or some sort of clue sheet about what Vtune expects from the stack formats? Also can anyone think of a work around that doesn't require Go changing its conventions.

Profiling an application which uses SIGNALS

Hello,
we are using Intel VTune 2015 for profiling our application which is running under CentOS 5.11.
Our application uses c++ signals for the control flow. When trying to do a basic hotspots analysis using amplxe-cl command line tool with the following parameters: 
-duration 20 --run-pass-thru=--profiling-signal=1
VTune yields the following error message when detaching after the 20 seconds duration. Alternative to the number 1 I also tried number 4 without any change in results.

S’abonner à Intel® Parallel Amplifier