Symmetric mode does not run and hangs mic

I have set up a system with Intel Xeon processors and Xeon Phi coprocessors under CentOS 7, kernel 3.10.0-327, Intel Composer Xe 2015.2.164. Installation os MPSS 3.6.1 went smoothly. Build up of MIC environment using nfs from host is just fine. I am able to compile MPI applications (e.g., https://software.intel.com/en-us/articles/using-the-intel-mpi-library-on-intel-xeon-phi-coprocessor-systems) for host and coprocessors (with -mmic).

MPSS Unable to start

Hello, my Xeon Phi was working for about a month and suddenly it has stopped working. Using the Intel MPSS Debugging Flowchart for Linux I followed the steps down and then across the bottom to eventually to resetting the configuration. Unloading works, cleanconfig gives no message, and then initdefaults gives:

[Warning] mic0: Generating compatibility network config file /opt/intel/mic/filesystem/mic0/etc/sysconfig/network/ifcfg-mic0 for IDB.
[Warning]       This may be problematic at best and will be removed in a future release, Check with the IDB release.

question on cache eviction

I'm experimenting on cache eviction in my code using _mm_clevict(), and have the following observations:

(1) if data are evicted from L1 alone (_MM_HINT_T0), the code is slowed by ~30%.

(2) if data are evicted from L2 alone (_MM_HINT_T1), the code is slowed by ~70%.

(3) if data are evicted from both L1 and L2, the code is still slowed by ~70%.

I'm confused by (3), as I think some additional loads should miss both L1 and L2 compared to (2), which should cause further latency. Any thoughts? Thanks!

how can I offload more than one part of a whole array ?

If I have a global array A, which contains all my data. I may use different parts of A in an offload sections. How do I just copy content of parts (more than one) I am interested in to mic but not copy all the content of global array. I have an example and how can I make it work? Thanks,

Less performance on mic

 HPL benchmark performance obtained on a host + 1 MIC cards is coming only 154GFlops. The Host system has 102 GB memory. The theoretical peak is 1.2TF +  + 256GFLOPS = 1.4TF.  May I please  know how to optimize the hpl performance? I've used the OFFLOAD execution, with the executable xhpl_offload_intel64.When i run hpl benchmark on simple host i am able to achieve 92 % performance. I am attaching all the files that i am using. Awaiting your quick reply.

How to install 'locale' on Xeon Phi?

Hi. I want to install the 'locale' package on Xeon Phi. How to do this? 

It's not like a library, which I can cross-compile and upload it to the board. 'locale' is related to environmental variables and files. How can I install it? Is there a way to add this package to the Linux image of Xeon Phi on the host? 


Native mpi

I would like to clear  confusion regarding native mpi paradigm in intel xeon phi. Is it possible to use mpi to create two tasks in two separate node xeon phi  natively ? That means, i want to use two coprocessors  as two independent separate nodes without any association of hosts

project rebuild error - #11012


I am just starting to learn how to program Intel PHI card.  I have recently written the 1st and a very simple program (to copy some array from host to card memory) but unable to rebuild successfully.

Development environment details:

-  Operating System: Windows Server 2012.

-  Compiler: Intel Compiler 2016.

-  Visual Studio 2013.

-  Programming language: C++.

-  Intel PHI card model: 6GB memory, 1.1GHz clock speed, 57cores. 

-  I have installed relevant drivers for window OS as per Intel guide.

OpenMP 4.0 Copying partial arrays to the device - weird behavior

Hello all.  I am trying to reduce my offload costs by only copying the data that I need.

!original copy
!$omp target update from(my_array)

!this works
!$omp target update from(1:1000,1)
!$omp target update from(1:1000,2)
!$omp target update from(1:1000,3)

!this doesn't
!$omp target update from(my_array(var1:var2, 1))
!$omp target update from(my_array(var1:var2, 2))
!$omp target update from(my_array(var1:var2, 3))
!starts coping gigabytes(????) of data or just seg faults


Any idea?

Konzern abonnieren