Server

Data alignment problem

Hi there

I was trying to offload some computation to MIC using "pragma", sending data addressed by a pointer p, then how to ensure the alignment of data on MIC after MIC recieved it? Does" __assume(p, 64)" work?I was trying to use instrinsics to load data to the vector RF, which requires the alignment of data.

Another problem, that I was trying to active lots of threads for the calculation using "#pragma omp parallel for", and some arrays inside the loop must be thread private while also 64-byte aligned.

OFFLOAD_DEVICES broke

Windows 7 x64 Pro, MS VS 2013, Intel Parallel Studio XE 2015 update 4, MPSS 3.4.3

Building the MIC sample LEO_tutorial as Release x64, and "Start Without Debugging"

Without OFFLOAD_DEVICES environment variable set, runs OK

With OFFLOAD_DEVICES=0 environment variable set, runs OK

(I have 2 5110P's)

With OFFLOAD_DEVICES=0,1 environment variable set, hangs 60 seconds, reports error
With OFFLOAD_DEVICES=1 environment variable set, hangs 60 seconds, reports error

Jim Dempsey

Measuring offload processing time with clock_gettime() and SCIF API

 

Hi, I recently built an app that sends data to MIC, process them, and return them.

I implemented the whole thing with just pthreads to get as much transparency as possible.

Problem is, I'm not sure I'm measuring the offload latency right.

I currently built it so that it take 4 timestamps:

offload begin (from host) - (scif transfer) - remote processing begin (from mic) - (actual processing) - remote processing end (from mic) - (scif transfer back to host) - offload end (from host)

Ask recommendation for socket-like and efficient api to communicate with mic

I am porting a server-client program to mic, which has high concurrency and massive data to transmit.

The server side will be running on mic and supply computing service for client on host.

There are more than 100 threads to transmit large than 10G data in total together. And it was using socket api to implement on clusters.

So i was wondering if there is some socket-like and efficient api for me to adapt this program to mic easily and efficiently?

Could you list some methods, and give some reference from which i can learn more?

Thank a lot.

 

Neusoft Computed Tomography on Intel® Xeon® Processor E5-2600 v3

Download PDF

Background

Neusoft Medical Systems Co., Ltd. is a leading manufacturer of medical equipment including Computed Tomography (CT)2,8, Magnetic Resonance Imaging (MRI)3, X-ray, Ultrasound, Positron Emission Tomography (PET)4, Linear Accelerator, and In Vitro Diagnostic (IVD)5. For more information about the company, see 1.

  • Server
  • Intel® Xeon® Processor
  • PET
  • IVD
  • Neusoft
  • Ct
  • MRI
  • Healthcare
  • Asynchronous data transfer does not work

    I try to perform an asynchronous data transfer to an Intel Xeon Phi. Note that asynchronous computation works as expected. If I try to combine data transfer and computation (in an offload statement) timing indicates that the data transfer is done synchronously while the following computation is done asynchronously.

    A test example that illustrates the point is given below. The output is 
    0.928997 0.288048
    which indicates that almost a second is spend in the asynchronous call while only 0.28 seconds are spend in waiting for that asynchronous call.

    offload inside parallel region: problem with private allocatable

    hi,

    I'm trying to offload some computation inside a parallel OMP region. I have problems with a PRIVATE allocatable array. I paste here simple code that shows the problem. The first time the offloaded code works as it should. The second time, the ALLOCATABLE variable p4 is not updated on the MIC.

    The output is:

    Subscribe to Server