Параллельные вычисления

OpenMP spinning time


I am using a simple Merge Sort benchmark on the Xeon Phi. 78% of the total CPU time is consumed by "libiomp5.so"

I tried to reduce the watsed time by the OpenMP runtime library by setting the "export KMP_BLOCKTIME=0". Please note that the application is running natively on the MIC. I have also tried "export OMP_WAIT_POLICY=passive". No effect!

Why this does not have any effect on the execution time or the wasted CPU time?

Thank you.

link error: libmkl_core.a depends on Open MPI (via libmkl_blacs_openmpi_lp64.a)

While invoking 2015.2.164 Intel icpc compiler, I have encountered a link error while linking against libmkl_core.a:

/opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_core.a(cpardiso_blacs_lp64.o): In function 'mkl_pds_lp64_cpardiso_mpi_barrier':

__work/lnx32e/_cpardiso/kernel/mpi_wrapper/cpardiso_blacs_lp64_h.f:(.text+0x6): undefined referece to 'MKL_Barrier'

No SITE annotations were encountered ?

Dear All,

I have just installed parallel studio 2016 and trying to use memory access pattern analysis with intel advisor. I have added annotation to source file like:

for (int i=0; i < nt.n_presyn; ++i) {


When I run the analysis with advisor I get:

calling flowgraph operators()() through shared library

Does anyone know if this is possible?...

To create a FlowGraph whereby each function is a class loaded from a shared library: probably Boost::extension to maintain portability between Windows & Linux.

Just a little concerned about:

- calls to the operator() to run the flow graph function

- any performance penalties this may incur

Any ideas?



Haswell and crosslan


I build a code for Integralimage computation with SSE and its quite good. But I have serious problems making use of AVX/AVX2. I run my code on an i5-4460.

What is the basis: For integral image I need rowsum which is not optimal for vector units but can be done by shuffle and add. And I need to broadcast the last element to all elements as a second step. This can be done with a shuffle.

Now with AVX, there is no full shuffle for 32 bit, but I can do it with a normal shuffle and _mm256_permute2f128_ps.

Visual Studio 2013 crashes when attempting to build project

I'm using Visual Studio 2013 Update 4 on Windows 7 Pro with Intel VTune 2015 Update 2.  I've been using VTune and Visual Studio just fine for the past few days, but I've recently hit a massive roadblock.  Visual Studio will crash with a null reference exception whenever I try to build the working solution.

Here's the call stack error:

ippsDotProd_32f Performance on Haswell CPU


at the moment I'm using ippsDotProd_32f in IPP 7.0 quite extensively in one of my projects. I now tested IPP 8.2 on a Haswell CPU (Xeon e5-2650 v3 in a HP z640 workstation) with this project because I expected it to be significantly faster (see below). Actually, the code was about 10% slower using IPP 8.2 which I found quite disturbing.

icc -mmic: Error: `xxx' is not supported on `k1om'

I compile a app with "icc -mmic", and it needs openssl library. So I compile ssl with "icc -mmic", then faild. Maybe it's the problem with perl? I don't know how to do with it, any advise? Thanks!


How to install Vtune Amplifier 2015 vtsspp on MPSS2.1.6720

I'm using mpss_gold_update_3-2.1.6720, with a uos version of

I'm now trying to install the vtune amplifier 2015 on this mpss, so I need sep3_15-k1om- and vtsspp-k1om- But I can find only sep3_15-k1om- and vtsspp-k1om-, but not vtsspp-k1om-

Подписаться на Параллельные вычисления