Server

Intel® Xeon® Processor E7 V2 Family New Reliability Features

 

The following article covers Reliability features at a glance.   A very comprehensive whitepaper on MCA recovery and how to change applications to be Recovery Aware is available here:   https://software.intel.com/en-us/articles/intel-xeon-processor-e7-880048002800-v2-product-family-based-platform-reliability

 

1) Introduction

 

Most efficient way for atomic updates on Xeon Phi

I have found out that __kmpc_atomic_float4_add was used in the assembly code of the following two lines:

#pragma omp atomic

array[i] += 1.0;

Performance of this code is not good on Intel Xeon Phi when many threads are used. Is there any information about how __kmpc_atomic_float4_add is implemented? Are there any better solutions for efficient and scalable atomic updates? Is it possible to use GCC intrinsics such as __sync_add_and_fetch() in offload regions?

How to

for example, I have

#pragma offload nocopy(a)
{
  a = malloc(sizeof(double)*ny*nx);
}

And now I want to initialize its first k lines from the data from Host

I can do something like:

inout = malloc(sizeof(double)*k*nx);
memcpy(inout, a, k*nx*sizeof(double));
#pragma offload in(inout: length(k*nx) alloc_if(1) free_if(1)) nocopy(a) in(nx, k)
{
 memcpy(a, inout, k * nx * sizeof(double));
}

Is there any way to avoid the temporary pointer `inout' ?

Thanks

 

MIC linking issues

I am getting the incompatibility error while linking a library using -mmic flag. I dont know how to make the piece of code compatible with native mic compilation.

x86_64-k1om-linux-ld: i386:x86-64 architecture of input file `libMisc.a(clock_time.o)' is incompatible with k1om output

//clock_time.c code

#include <time.h>

double MPI_Wtime(void);

double clock_time_()

{

  return MPI_Wtime();

}

 

offload overhead

If we don't use native mode, is there a way to disable creating memory buffer in the offload region? The CPU time is too much so that my accelerated program cannot achieve speedup. Note that all the IN-variables are scalar.

[Offload] [MIC 0] [Line]            144

[Offload] [MIC 0] [Tag]             Tag 1598

[Offload] [HOST]  [Tag 1598] [State]   Start Offload

[Offload] [HOST]  [Tag 1598] [State]   Initialize function __offload_entry_AcceleratorUtilitiesOp_C_144doArrayDa_cfaca3494cc6212aae7ad712694b42c4

dell R7610 issue & offload options on windows 7 X64

 

hi,

 

i have installed xeon phi dev env on windows 7 enterpise X64 on a dell R7610

first (I):

i have installed MPSS3.2.1

sometimes after PC starts  the xeon phi is not seen in the windows device manager

after desintall / reinstall MPSS  => it 's ok the xeon phi is seen

 

second (II)

i try to make a personal benchmark using  CAO  offload mode.  (FFT with MKL and FFTW3 )

i generate a 64 bits exe  with visual studio and intel c++ 14.0

but in the execution i got an :

How to hybrid MIC and CPU without copy-and-paste

Hello,

I know I can hybrid MIC and CPU by using synchronized offload directive.

But I have one question. How to do that without copying and paste codes

For example, there is a vector addition

#pragma omp prallel for
for(i=0;i<N;i++) C[i] = A[i] + B[i];

I can hybrid it like:

#pragma offload inout(C[0:N/2]: alloc(C[0:N/2])) signal(&sig)
{
#pragma omp parallel for
for(i=0;i<N/2;i++) C[i] = A[i] + B[i];
}

#pragma omp parallel for
for(i=N/2;i<N;i++) C[i] = A[i] + B[i];

#pragma offload wait(&sig)

 

Iscriversi a Server