Simple offloaded code, enormous time consuming

Dear all,

I recently started using Xeon Phi cards for parallel programming, so I am still a newbie in this field.

I wrote this code as a simple example to start understanding this fascinating world, but I got surprised when I looked at the time of executions.

When I run the code on the host, execution time is 0,08 s. When I run the code adding the pragma offload and pragma omp parallel for, execution time increase up to 9s!

When I compiled the codes, I used -O3 optimization for both of them.

Is there something I am missing?


SLES11sp3 + MPSS 3.4.3 + MOFED IB scif issues


We have a number of iDataPlex dx360 M4 Server machines with each 2 Xeon Phi Coprocessor 5110P cards, and one Mellanox ConnectX-3 card. We're running SLES 11sp3 linux on these machines using a 3.0.101-0.40-default kernel. I've installed mpss 3.4.3, updated the firmware and almost everything seems to function.

The only problem I encounter is with infiniband.

The Last Line Effect


I have studied numbers of errors caused by using the Copy-Paste method and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have never seen this phenomenon described in books on programming, so I decided to write about it myself. I called it the "last line effect".

Unix Signal Handling (Xeon-phi)


I am porting a code to Xeon Phi (using manual offload) in C++ and I am trying to catch SIGINT signal to free correctly memory before stopping the program. This program also uses openMP tasks for asynchronous I/O.

My first goal is to ignore the SIGINT signal with the function sigaction and the macro SIG_IGN. Unfortunately, my program can still be stopped by a Ctrl C. I also tried to block the SIGINT signal (with pthread_sigmask) before the omp parallel region and catch this signal in the master thread only but without success.   

Odd behaviour regarding execution time vs number of threads



While porting an image processing library to the Xeon Phi, I stumbled upon a strange behaviour: the processing is about 20% faster when I set the number of threads to precisely 103 (I ran the processing multiple times using between 95 and 118 threads).

Connecting coprocessor to bridge to communicate with internet


I am running Ubuntu 14.04 with a xeon phi 31s1p and I have been trying to set up a bridge so that I can have the phi access the internet, although I have been having a lot of trouble and can't seem to figure out what's wrong. I'm pretty sure the bridge itself is fine but the phi can't connect to it, anytime I try and use the simple command for it to connect to the bridge it gives this:

/var/mpss/mic0/etc/network# micctrl --network=static --bridge=br0 --ip=
  [Error] br0: Failed - required brctl command not installed

Assine o Professores