Most efficient way for atomic updates on Xeon Phi

I have found out that __kmpc_atomic_float4_add was used in the assembly code of the following two lines:

#pragma omp atomic

array[i] += 1.0;

Performance of this code is not good on Intel Xeon Phi when many threads are used. Is there any information about how __kmpc_atomic_float4_add is implemented? Are there any better solutions for efficient and scalable atomic updates? Is it possible to use GCC intrinsics such as __sync_add_and_fetch() in offload regions?

Installation problem: 32-bit libraries not found on this system on Ubuntu 14.04 LTS

Hi all,

Ijust installed Ubuntu 14.04 LTS to my 64bit laptop.

I downloaded Intel Fortran Composer XE for Linux - Version 2013 SP1.

During the installation I got "Unsupported operating system error" but I skipped it.

Later, I get :

Missing optional prerequisites

-- 32-bit libraries not found


I followed the instructions given in:

How to

for example, I have

#pragma offload nocopy(a)
  a = malloc(sizeof(double)*ny*nx);

And now I want to initialize its first k lines from the data from Host

I can do something like:

inout = malloc(sizeof(double)*k*nx);
memcpy(inout, a, k*nx*sizeof(double));
#pragma offload in(inout: length(k*nx) alloc_if(1) free_if(1)) nocopy(a) in(nx, k)
 memcpy(a, inout, k * nx * sizeof(double));

Is there any way to avoid the temporary pointer `inout' ?



MIC linking issues

I am getting the incompatibility error while linking a library using -mmic flag. I dont know how to make the piece of code compatible with native mic compilation.

x86_64-k1om-linux-ld: i386:x86-64 architecture of input file `libMisc.a(clock_time.o)' is incompatible with k1om output

//clock_time.c code

#include <time.h>

double MPI_Wtime(void);

double clock_time_()


  return MPI_Wtime();



offload overhead

If we don't use native mode, is there a way to disable creating memory buffer in the offload region? The CPU time is too much so that my accelerated program cannot achieve speedup. Note that all the IN-variables are scalar.

[Offload] [MIC 0] [Line]            144

[Offload] [MIC 0] [Tag]             Tag 1598

[Offload] [HOST]  [Tag 1598] [State]   Start Offload

[Offload] [HOST]  [Tag 1598] [State]   Initialize function __offload_entry_AcceleratorUtilitiesOp_C_144doArrayDa_cfaca3494cc6212aae7ad712694b42c4

Suscribirse a Linux*