Intel® VTune™ Amplifier XE

Announcing the Intel® Parallel Studio XE 2016 Beta!

You may have received an email inviting you to the Intel® Parallel Studio XE 2016 Beta.  VTune Amplifier XE 2016 beta is part of the studio and adds OpenMP* parallelization inefficiency, imbalance and work sharing analysis to tune for more efficient use of parallel regions. It also now supports multi-rank analysis of MPI* compute nodes with or without OpenMP use.  Various ease-of-use enhancements include confidence indicators in General Exploration analysis results, "super tiny" bird's-eye view timeline, and "Platform" tab replacing "Tasks and Frames" tab.

How to profile MPI processes on all nodes?

VTune(TM) Amplifier XE 2015 can analyze MPI processes combined in hybrid codes in cluster system. It means that VTune Amplifier runs parallel MPI program on N ranks to collect performance data, then identify which hot function on which rank consumed highest CPU time.

First at all, need to set tools' environment, these tools are from Intel Cluster Stdio XE 2015: (for example)

1. Intel Composer XE

$ source /opt/intel/ics/2015.0.3.032/composer_xe_2015/bin/ intel64

2. Intel MPI Library 

"-collect-with runsa -knob event-config" only works with Basic Performance Tuning Events

For example


works fine. But for many others such as MEM_UNCORE_RETIRED.REMOTE_DRAM

amplxe-cl will give error like:

amplxe: Error: Cannot configure sampling event groups. The collection is terminated.

Could anyone help? Thanks

Windows XE 2015:"Accurate CPU time detection was disabled. Trace session is already in use"

I am using Amplifier XE 2015 on Windows 7 and trying to profile 4xMPI processes running on my local machine. I get 3x of the above messages when running 4 MPI processes. Is that expected? That is it seems that XE is having problems profiling multiple MPI processes at the same time.

mpiexec -n 4 amplxe-cl -result-dir my_result_ah -collect hotspots -- <my_exe.exe>

Estimating FLOPS

I have Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz which is Haswell based processor. I want to estimate FLOPS of an application. I am using Intel VTune Amplifier XE 2015. wondering if anybody knows how to find FLOPS?

I tried following steps on but I don't find Processor Event Name on the pages in VTune. wondering if anybody has successfully done this on Haswell processor.

