How to hybrid MIC and CPU without copy-and-paste


I know I can hybrid MIC and CPU by using synchronized offload directive.

But I have one question. How to do that without copying and paste codes

For example, there is a vector addition

#pragma omp prallel for
for(i=0;i<N;i++) C[i] = A[i] + B[i];

I can hybrid it like:

#pragma offload inout(C[0:N/2]: alloc(C[0:N/2])) signal(&sig)
#pragma omp parallel for
for(i=0;i<N/2;i++) C[i] = A[i] + B[i];

#pragma omp parallel for
for(i=N/2;i<N;i++) C[i] = A[i] + B[i];

#pragma offload wait(&sig)


Offload compilation problem with -openmp option.

Hi all!

I have problems using openmp and offload directives. The following (reduced) code give right result (1  2  3  4  5  0  0  0  0  0), when it's compiled without openmp ("ifort test.f -o test"), and wrong (1  2  3  4  5  6  7  8  9 10) with openmp ("ifort -openmp test.f -o test").


Installing MPSS


I Recently obtained a xeon-phi and I have been trying to install it. I have centos 6.5. I followed the instruction and installed the mpss service using yum install MPSS. After installing, i do lsmod | grep mic and I can see the mic. Furthermore, I can see xeon phi in lspci -vv


ClCreateBuffer(| CL_MEM_USE_HOST_PTR): When does OpenCL framework transfer data to device via PCI?


Intel Xeon Phi OpenCL optimization guide suggests using Mapped buffers for data transfer between host and device memory. OpenCL spec also states that the technique is faster than having to write data explicitly to device memory. I am trying to measure the data transfer time from host-device, and from device-host. 

My understanding is that OpenCL framework supports two ways of transferring data.

Here is my summarized scenario:

a. Explicit Method:

    - Writing: ClWriteBuffer(...)

issue with MPI communication with two MIC cards and xeon processor


I am running a MPI application (involving 5 ranks) which runs smoothly when all ranks are on Xeon processor but when i put two ranks on MIC0 and MIC1 there is following issue and the program just hangs and gives me segmentation fault.


using (blocking MPI send and non blocking MPI recv)

rank0, rank1 on MIC0,MIC1

rank2,rank3,rank4 on xeon


rank1-->sends 100 packets and reaches finalize() 

rank2-->only receives 60 packets and then hangs

some things i tried:-

Few issues with mic and mpssd


As I described in another post, mpss-3.2.1 is running on kernel 3.13.10 perfectly (Fedora 20).
 I can run programs on the processor and have no problems except:

1. mpssd daemon  is taking 100% of one host cpu all the time.

2. If mpssd is not started the coprocessor fan is always on. When mpssd starts it goes off when there is no activity. Since people may not want to use the card all the time the default fan speed should be off or low it seems like.

Iscriversi a Professori