Collecting CPU Time for FLOPS calculation

I am trying to estimate the number of FLOP and FLOPS for my application by using hardware EBS running from the command line. I have implemented the  __itt_pause()  and __itt_resume() around my algorithm of interest. I run this command:

C:/Program Files (x86)/Intel/VTune Amplifier XE 2015/bin64/amplxe-cl.exe -collect-with runsa -knob event-config=FP_COMP_OPS_EXE.X87:sa=2000000 -start-paused --result-dir foo application.exe

How basic hotspot analysis is different from advance hotspot analyis

I run the application on intel vtune to extract the information about the functions that are consuming most of the time. Same application running with the basic hotspot analysis and advance hotspot analysis are indicating different functions. Can you please explain me what is the difference between these analyzers and what could be the reason for this behavior. 

No Data Shown After Compilation of C Source Code with OpenMP


I'm now using Intel VTune Amplifier XE 2015. Also, I have no problem in running Tachyon (sample code).

I was trying to analyze an executable file generated after complilation of a C source code with OpenMP API.

When I run Advanced Hotspots Analysis, I can't view the result of analysis at OpenMP region. It stated "No Data To Show".

Below are the steps that I'd taken to run the analysis:

1) source /opt/intel/vtune_amplifier_xe_2015/

2) gcc -fopenmp -g Matrix.c -O2 -o Matrix.exe

3) export EDITOR=gedit

4) ./amplxe-gui

Loop Iteration Time using VTune CLI

Hi, I am running an OpenMP code on the Intel Xeon Phi. I want to profile the code using VTune amplifier on Stampede to find out the number of loop iterations and the number of distinct array accesses for each loop. I couldn't find the related events anywhere. I want to use the command line interface of VTune so that I can use VTune GUI installed in my local system to see the results in GUI. Can you kindly help me with the appropriate command ?

Duration parameter of "collect"


I am trying to gather some system-wide hardware counters for my application, X seconds after it has started, over a period of Y seconds. I am using the following command line:

amplxe-cl --collect my_custom_conf -target-duration-type=veryshort -duration 30 -no-auto-finalize -no-summary -data-limit=0 -resume-after=20000

and I expect the collection to start after 20s and last for 30s.

I have two questions:

vtune_amplifier_xe_2013 + how to compile

I have vtune_amplifier_xe_2013, I used it one year ago to analyze the CPU time in my program. 

I remember that it produce the files: .dump and .xml

I do not remember anymore how to compile the program to get the previous files.

I do not remember the flags that I have to use in ifort.

I am not able to find the guide anymore. Now I am trying to look inside the  vtune_amplifier_xe_2013 folder.



Intel VTune can not collect information & cause core dump

we are using Intel VTune 2015 for profiling our application which is  running under operating system:2.6.32-504.1.3.el6.x86_64 Red Hat Enterprise Linux Server release 6.6 (Santiago)
CPU: Intel(R) Xeon(R) E5/E7 v2 processor
Frequency          2800004679
Logical CPU Count  4
I started four ngss.elf which is our product.
# ps -ef|grep ngss
root       400 31483  0 07:34 pts/0    00:00:00 ./ngss.elf --iomn 294921

