Intel® Many Integrated Core Architecture (Intel MIC Architecture)

31S1P: BSOD or Device Connection Lost

31S1P: BSOD or Device Connection Lost


There are some kind of instabilities with my Xeon Phi 31s1p. With or WITHOUT(!) a job on the coprocessor there are events like

june 16 2015 21:02:31: Warning: mic0: Device connection lost!
june 16 2015 21:03:47: Information: mic0: Device connection restored


With an OpenCL job the situation gets worse: it may loose connection or raise blue screen of death :) IRQL_LESS_OR_EQUAL in some MPSS dll (I will take a screenshot later).

Data alignment problem

Hi there

I was trying to offload some computation to MIC using "pragma", sending data addressed by a pointer p, then how to ensure the alignment of data on MIC after MIC recieved it? Does" __assume(p, 64)" work?I was trying to use instrinsics to load data to the vector RF, which requires the alignment of data.

Another problem, that I was trying to active lots of threads for the calculation using "#pragma omp parallel for", and some arrays inside the loop must be thread private while also 64-byte aligned.


Windows 7 x64 Pro, MS VS 2013, Intel Parallel Studio XE 2015 update 4, MPSS 3.4.3

Building the MIC sample LEO_tutorial as Release x64, and "Start Without Debugging"

Without OFFLOAD_DEVICES environment variable set, runs OK

With OFFLOAD_DEVICES=0 environment variable set, runs OK

(I have 2 5110P's)

With OFFLOAD_DEVICES=0,1 environment variable set, hangs 60 seconds, reports error
With OFFLOAD_DEVICES=1 environment variable set, hangs 60 seconds, reports error

Jim Dempsey

Measuring offload processing time with clock_gettime() and SCIF API


Hi, I recently built an app that sends data to MIC, process them, and return them.

I implemented the whole thing with just pthreads to get as much transparency as possible.

Problem is, I'm not sure I'm measuring the offload latency right.

I currently built it so that it take 4 timestamps:

offload begin (from host) - (scif transfer) - remote processing begin (from mic) - (actual processing) - remote processing end (from mic) - (scif transfer back to host) - offload end (from host)

Ask recommendation for socket-like and efficient api to communicate with mic

I am porting a server-client program to mic, which has high concurrency and massive data to transmit.

The server side will be running on mic and supply computing service for client on host.

There are more than 100 threads to transmit large than 10G data in total together. And it was using socket api to implement on clusters.

So i was wondering if there is some socket-like and efficient api for me to adapt this program to mic easily and efficiently?

Could you list some methods, and give some reference from which i can learn more?

Thank a lot.


Asynchronous data transfer does not work

I try to perform an asynchronous data transfer to an Intel Xeon Phi. Note that asynchronous computation works as expected. If I try to combine data transfer and computation (in an offload statement) timing indicates that the data transfer is done synchronously while the following computation is done asynchronously.

A test example that illustrates the point is given below. The output is 
0.928997 0.288048
which indicates that almost a second is spend in the asynchronous call while only 0.28 seconds are spend in waiting for that asynchronous call.

offload inside parallel region: problem with private allocatable


I'm trying to offload some computation inside a parallel OMP region. I have problems with a PRIVATE allocatable array. I paste here simple code that shows the problem. The first time the offloaded code works as it should. The second time, the ALLOCATABLE variable p4 is not updated on the MIC.

The output is:

Offload pointer in struct


I'm new to MIC programming, so this is probably a silly question, but I've searched and could not find a solution.

I have a struct with a pointer in it and I want to offload that pointer (the array) to MIC. AFAIK, I can't offload the whole struct, because it's not bitwise copyable. But I was hoping I could offload just the pointer, as a normal array. Below is a minimal example, which segfaults. What am I doing wrong?

As a side question, what are my options if I need to copy the whole struct?

Assine o Intel® Many Integrated Core Architecture (Intel MIC Architecture)