Windows 7 x64 Pro, MS VS 2013, Intel Parallel Studio XE 2015 update 4, MPSS 3.4.3

Building the MIC sample LEO_tutorial as Release x64, and "Start Without Debugging"

Without OFFLOAD_DEVICES environment variable set, runs OK

With OFFLOAD_DEVICES=0 environment variable set, runs OK

(I have 2 5110P's)

With OFFLOAD_DEVICES=0,1 environment variable set, hangs 60 seconds, reports error
With OFFLOAD_DEVICES=1 environment variable set, hangs 60 seconds, reports error

Jim Dempsey

Measuring offload processing time with clock_gettime() and SCIF API


Hi, I recently built an app that sends data to MIC, process them, and return them.

I implemented the whole thing with just pthreads to get as much transparency as possible.

Problem is, I'm not sure I'm measuring the offload latency right.

I currently built it so that it take 4 timestamps:

offload begin (from host) - (scif transfer) - remote processing begin (from mic) - (actual processing) - remote processing end (from mic) - (scif transfer back to host) - offload end (from host)

Ask recommendation for socket-like and efficient api to communicate with mic

I am porting a server-client program to mic, which has high concurrency and massive data to transmit.

The server side will be running on mic and supply computing service for client on host.

There are more than 100 threads to transmit large than 10G data in total together. And it was using socket api to implement on clusters.

So i was wondering if there is some socket-like and efficient api for me to adapt this program to mic easily and efficiently?

Could you list some methods, and give some reference from which i can learn more?

Thank a lot.


Neusoft Computed Tomography on Intel® Xeon® Processor E5-2600 v3

Download PDF


Neusoft Medical Systems Co., Ltd. is a leading manufacturer of medical equipment including Computed Tomography (CT)2,8, Magnetic Resonance Imaging (MRI)3, X-ray, Ultrasound, Positron Emission Tomography (PET)4, Linear Accelerator, and In Vitro Diagnostic (IVD)5. For more information about the company, see 1.

  • Server
  • Intel® Xeon® Processor
  • PET
  • IVD
  • Neusoft
  • Ct
  • MRI
  • Healthcare
  • Asynchronous data transfer does not work

    I try to perform an asynchronous data transfer to an Intel Xeon Phi. Note that asynchronous computation works as expected. If I try to combine data transfer and computation (in an offload statement) timing indicates that the data transfer is done synchronously while the following computation is done asynchronously.

    A test example that illustrates the point is given below. The output is 
    0.928997 0.288048
    which indicates that almost a second is spend in the asynchronous call while only 0.28 seconds are spend in waiting for that asynchronous call.

    offload inside parallel region: problem with private allocatable


    I'm trying to offload some computation inside a parallel OMP region. I have problems with a PRIVATE allocatable array. I paste here simple code that shows the problem. The first time the offloaded code works as it should. The second time, the ALLOCATABLE variable p4 is not updated on the MIC.

    The output is:

    Offload pointer in struct


    I'm new to MIC programming, so this is probably a silly question, but I've searched and could not find a solution.

    I have a struct with a pointer in it and I want to offload that pointer (the array) to MIC. AFAIK, I can't offload the whole struct, because it's not bitwise copyable. But I was hoping I could offload just the pointer, as a normal array. Below is a minimal example, which segfaults. What am I doing wrong?

    As a side question, what are my options if I need to copy the whole struct?

    Subscribe to Server