Intel vtune is very slow in finalizing results(linux)


I'm using intel vtune amplifier 2015(linux version). my sample time of the work load is 180 seconds. I gave my SW build with debug symbols enabled. 

In vutune->project properties, I gave the path for the build and the source files and symbols. When i give re-resolve, vtune takes more than 1 hour to finalize and display results. The progress bar goes to 30% and remains stuck there and it says "finalizing results " for more than an hour.

What is the problem here. why does it take so long to display results when i hit re-resolve?

Using Intel® VTune™ Amplifier XE to Tune Software on the 5th generation Intel® Core™ processor family

Download this guide (see Article Attachments, below) to learn how to identify performance issues on software running on the 5th generation Intel® Core™ processor family (based on Intel® Microarchitecture Codename Broadwell). The guide explains the General Exploration Analysis viewpoint available in Intel® VTune™ Amplifier XE. It also walks through some of the most common performance issues that the VTune Amplifier XE interface highlights, what each issue means, and some suggested ways to fix them.

  • Sviluppatori
  • Professori
  • Studenti
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • C#
  • C/C++
  • Fortran
  • Intel® VTune™ Amplifier XE
  • Intel VTune Amplifier
  • app performance tools
  • application optimization
  • Strumenti di sviluppo
  • Ottimizzazione
  • Elaborazione parallela
  • Threading
  • Blocks of different sizes in ScaLAPACK?

    I am performing a Cholesky factorization with Intel-MKL, which uses ScaLAPACK. I distributed the matrix, based on this example, where the matrix is distributed in blocks, which are of equal size (i.e. Nb x Mb). I tried to make it so that every block has it's own size, depending on which process it belongs, so that I can experiment more and maybe get better performance.

    can SEP co-exist with perf driver?

    If a system has perf driver installed, can we also install SEP driver?  I assume Vtune first checks for SEP and uses it if it finds it.  If it can't find SEP I assume it looks for a compatible perf driver, correct?

    So there should be no issues or anything special to use VTune on a system with both SEP and perf drivers?


    cblas_dnrm2 much slower than cblas_ddot

    Dear all,

    I run benchmarks on a sandy-bridge Intel processor (E5-4620) using Intel MKL 11.1. Here, I have found that cblas_dnrm2 is significantly slower (3.4 s) than the corresponding cblas_ddot call (0.5 s) using one thread. This is very surprising for me, because if I use cblas_ddot to calculate the 2-Norm it is faster (0.3 s) than cblas_dnrm2.

    I have compiled with gcc-4.8.3 with following flags:

    CXXFLAGS += -O3 -I${MKLROOT}/include

    Replace the gcc by icc, but meet performance problem

    Hi, all,

    Now our team try to use the icc replace gcc which we have used years. But find the icc compiled executable file performance is bad than gcc.

    We use the script generated some small demos to test this, all in the attached test.tar.gz

    Test machine CPU: Intel(R) Xeon(R) CPU E7-4850 v2 @ 2.30GHz
    OS: Centos 6.6
    GCC: 4.7.2
    ICC: parallel_studio_xe_2015_update3

    Iscriversi a Threading