Run a function on mic0 and mic1 concurrently (OpenMP)


I am implementing openMP offload program currently running on single Phi device.
My computing node has 2 Phi devices named mic0 and mic1.
Could you please let me know how I can extend the following code section to run on both devices simultaneously.

Assume the code sections can be run independently to each other where all data (input and output) are independent.
I need to know the structure of code and the directives to do it properly in openMP. Further, compilation method also required.

Surface sharing between OPENCL and DirectX

I am working on Decode-OPENCL-Encode pipeline on intel processor. There is a sample code provide by intel for media interop.

If we look at the DecodeOneFrame() function below: 

    // decode next frame and put result to output surface
    mfxStatus CDecodingPipeline::DecodeOneFrame(int Width, int Height, IDirect3DSurface9 *pDstSurface, IDirect3DDevice9* pd3dDevice)

Error when expanding benchmarks to two or more nodes

Hello, I'm running mp_linpack in two nodes(I have grasped how to run in one node :) ), I changed the path to mp_linpack/bin_intel/intel64, and modify the HPL.dat to set P=1, Q=2. Then I created hosts file, whose content is:


mic01 and mic03 is the two nodes where I want to run linpack. Then I use the command in MKL_userguide.pdf:

mpirun --prefix 1 -n 2 -hosts Node1,Node2 \
-genv LD_LIBRARY_PATH $LD_LIBRARY_PATH ./xhpl_offload_intel64

How to install Vtune Amplifier 2015 vtsspp on MPSS2.1.6720

I'm using mpss_gold_update_3-2.1.6720, with a uos version of

I'm now trying to install the vtune amplifier 2015 on this mpss, so I need sep3_15-k1om- and vtsspp-k1om- But I can find only sep3_15-k1om- and vtsspp-k1om-, but not vtsspp-k1om-

Q on memory comparison optimization

Hi All,

I am using AVX/SSE instructions to replace memcmp and our workload includes comparing 64 bytes and occasionally 64 and 128 bytes. I am using following function cmp32 for 32byte comparisons and extend it 2 times for 64 or 4 times for 128 bytes and I am hardly getting 1% performance improvement. Testing was done on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04 x86_64.

I tried replacing following lines
vcmp = _mm256_cmpeq_epi64(xmm0, xmm1);
vmask = _mm256_movemask_epi8(vcmp);

Iscriversi a Ottimizzazione