Intel® Many Integrated Core Architecture (Intel MIC Architecture)

Install Manual Change

I have had several issue rebuilding the MPSS on Centos 7/RHEL and later here is a proposed change to the manual to fix it.

Problem Symptoms: Are that in section

the command: 

yum install kernel-headers kernel-devel 

Does not install all of the right source and the build will fail.

This command should be used in it's place 

sudo yum install "kernel-devel-uname-r == $(uname -r)"

This will put everything in the right place under 

/lib/moduale/(uname -r)/build (which is a link)

LIBXSMM Prefetching on Xeon Phi


I am attempting to use LIBXSMM on a Xeon Phi and it fails to compile when I use

    make install OFFLOAD=1 PREFETCH=1 MNK="2,4,6,8,10,12,14,16,18,20,23"

I am getting the error

    ~/libxsmm/samples/smm/specialized.cpp(143): error: call of an object of a class type without appropriate operator() or conversion functions to pointer-to-function type
            xmm(pa, pb, pc LIBXSMM_PREFETCH_ARGA(pa + asize) LIBXSMM_PREFETCH_ARGB(pb + bsize) LIBXSMM_PREFETCH_ARGC(tmp) LIBXSMM_PREFETCH_ARGC(pc + csize_act));

Strange behavior of Xeon Phi 31s1p


I am experiencing problems with a Xeon Phi 31s1p.

The motherboard is Asus P9X79WS (BIOS version 4802 - the most recent), the procesor is Intel I7 3820 @3.6GHz, the video card is Sapphire Radeon HD 7990 and the operating system is Windows 10.

The option for Xeon Phi is activated in the BIOS.

The MPSS version is 3.6.


Hey Guys, 

I am trying hStreams these days and encounter the following issue. 

When running the tiled LU (from hStreams ref_code), the code reports "hStreams_EventWait1: returns: HSTR_RESULT_REMOTE_ERROR: ... segmentation fault"  (core dumped) ./lu_tiled_hstreams -m 6000 -t 10 -s 8 -l row -i 3

The point is that, sometimes the code can run and exit normally, while it reports such an error in some other time. 

Note that I am using MPSS 3.6. 


1st generation Intel® Xeon Phi™ coprocessor to Knights Landing upgrade program


During different presentations of the Knights Landing, some guys mentioned that there is an upgrade program from the first generation of the Intel Xeon Phi (codenamed Knights Corner) to the 2nd generation - Knights Landing. 

Can someone explains if this program allows the owners of the 1st generation Xeon Phi to upgrade to the 2nd one with some kind of special offers?

If yes, is there an application form that needs to be filled, or maybe an email address to contact?

Running Xeon Phi using dockers

Hello, I am trying to configure and access Xeon-Phi by running a linux container running a centos image. 

Host OS: 4.2.0-coreos-r1 and I am running the centos image as a linux container. When I try to install the MPSS library it breaks in the build phase with following error message. Initially it was not able to find env variable for $(DESTDIR), so i have assigned a folder. But still it breaks and I am not able to find the reason.

Sequential Performance on the Xeon Phi

Hi, I have been running different benchmarks on the Xeon Phi. In comparison with a E5-2620 Xeon processor running at 2.00GHz, I noticed a large difference in the sequential performance (almost 10x considering different cases)

Can we conclude Xeon Phi always shares the frequency between hardware threads, even for the sequential codes? In other words, the clock frequency of 1.053GHz will be divided by 4 (if it switches between cores in a round-robin fashion)?

If that is true, would it be possible to take advantage of the full core's frequency at all?

code producing segmentation fault on offload with -openmp option

Hi all

My code has a module state_test which makes a call to state.

On offloading the call to state I get a segfault.

Here is the call to state

!dir$ offload begin target(mic:0)in(TRCR)out(RHOK1,RHOK2,RHOK3,RHOK4)
      call state(k,kk,TRCR(:,:,:,1), TRCR(:,:,:,2), this_block,RHOK1,RHOK2,RHOK3,RHOK4)
!dir$ end offload

However when offload is performed without the -openmp option it works fine.

Here is the compile line that segfaults.

ifort state_test.F90 state_mod.F90 -openmp

First touch time greater than parallel time

Hi all,

I was looking to parallelize my code for speedup.

As xeon phi was a NUMA core I used the first touch placement of the data.

while xeon phi is performing better than xeon no doubt, the problem is that totaltime(time for first touch+looptime) is greater.

How do I resolve this issue?

This code when integrated into the main code(cannot post it here) will call state function many times from various different places. So is it possible that even if I dont first touch as I have in the code attached below this overhead is just a onetime problem?

Assine o Intel® Many Integrated Core Architecture (Intel MIC Architecture)