Intel® Many Integrated Core Architecture (Intel MIC Architecture)

OpenMP spinning time


I am using a simple Merge Sort benchmark on the Xeon Phi. 78% of the total CPU time is consumed by ""

I tried to reduce the watsed time by the OpenMP runtime library by setting the "export KMP_BLOCKTIME=0". Please note that the application is running natively on the MIC. I have also tried "export OMP_WAIT_POLICY=passive". No effect!

Why this does not have any effect on the execution time or the wasted CPU time?

Thank you.

Run a function on mic0 and mic1 concurrently (OpenMP)


I am implementing openMP offload program currently running on single Phi device.
My computing node has 2 Phi devices named mic0 and mic1.
Could you please let me know how I can extend the following code section to run on both devices simultaneously.

Assume the code sections can be run independently to each other where all data (input and output) are independent.
I need to know the structure of code and the directives to do it properly in openMP. Further, compilation method also required.

Error when expanding benchmarks to two or more nodes

Hello, I'm running mp_linpack in two nodes(I have grasped how to run in one node :) ), I changed the path to mp_linpack/bin_intel/intel64, and modify the HPL.dat to set P=1, Q=2. Then I created hosts file, whose content is:


mic01 and mic03 is the two nodes where I want to run linpack. Then I use the command in MKL_userguide.pdf:

mpirun --prefix 1 -n 2 -hosts Node1,Node2 \
-genv LD_LIBRARY_PATH $LD_LIBRARY_PATH ./xhpl_offload_intel64

How to install Vtune Amplifier 2015 vtsspp on MPSS2.1.6720

I'm using mpss_gold_update_3-2.1.6720, with a uos version of

I'm now trying to install the vtune amplifier 2015 on this mpss, so I need sep3_15-k1om- and vtsspp-k1om- But I can find only sep3_15-k1om- and vtsspp-k1om-, but not vtsspp-k1om-

mic stuck at booting state

[root@amax ~]# micctrl -s
mic0: booting (mode: linux image: /lib/firmware/mic/uos.img)
mic1: booting (mode: linux image: /lib/firmware/mic/uos.img)

I have two mic cards, but they just get stuck at booting state, and never be online. No matter I reboot the host or restart the mpss service, it just stayed at this state. Any help please? Thank you very much!

Simple offloaded code, enormous time consuming

Dear all,

I recently started using Xeon Phi cards for parallel programming, so I am still a newbie in this field.

I wrote this code as a simple example to start understanding this fascinating world, but I got surprised when I looked at the time of executions.

When I run the code on the host, execution time is 0,08 s. When I run the code adding the pragma offload and pragma omp parallel for, execution time increase up to 9s!

When I compiled the codes, I used -O3 optimization for both of them.

Is there something I am missing?


Subscribe to Intel® Many Integrated Core Architecture (Intel MIC Architecture)