Arquitectura Intel® para muchos núcleos integrados

two dimensional array offload issue

i have two dimensional dynamic array that i offload to phi. i dont really pass any data all i want is to allocate mem via transfer and access that mem via nocopy each iteration later on

void foo()

unsigned int ** twoDimArray = new ... etc ... [n*m]

#pragma offload_transfer target(mic:MIC_DEV) in(twoDimArray :length(n*m) alloc_if(1) free_if(0))

while (condition) {

//nocopy offload each iteration of external loop
#pragma offload target(mic:MIC_DEV) nocopy(twoDimArray :length(n*m) alloc_if(0) free_if(0))


preventing execution of remainder loop on xeon phi coprocessor

Hey everyone, consider the following sample code below. 

compiling with ifort -O3 -align array64byte -openmp -vec-report6 spits out something to the effect that nlist is aligned, the SIMD generated vectorization, and position is 64 bit indexed in the offloaded inner loop at line 93. However in the remainder loop, as we expect, nothing is aligned but the remainder code is vectorized. The !dir$ vector aligned prevents the creation of a peel loop like want.

Advanced Computer Concepts For The (Not So) Common Chef: Introduction

While talking to a very intelligent but non-engineer colleague, I found myself needing to explain the threading and other components of the Intel® Xeon Phi™ ⅹ100 and ⅹ200 architectures. The first topic that came up was hyper-threading, and more specifically, the coprocessor’s version of hyper-threading. Wracking my brain, I finally hit upon an analogy that seemed to suit: the common kitchen.

Offload with persistent MIC buffer: are global pointers required?

We have been through that once, but here we go again, because latest results confuse me. My question is: in order to re-use a previously allocated memory buffer on the coprocessor, is the programmer required to supply a global pointer with attribute((target(mic))) in pragma offload?

The reason for this question is that I observe that global variables work in all cases, but local variables work in all cases except one (ouch!). So either it is a bug in the compiler or COI, or it a sign that one programming practice is better than another.

Xeon Phi 7120P always runs at lowest frequency

I recently installed one 7120P in one of my servers. It seems working fine, but I noticed that it always runs at the lowest available frequency. Even I am running the benchmark application coming with intel compiler, the frequency stays at 0.57GHz.

Any idea about this?

Here is some information about my machine

Expected performance gain ... 5960X vs Xeon Phi?

I am a retired theoretical physical chemist with a long association with computers and computing.
As briefly as possible, my interests are in the behavior of fluids at a phase boundary, such as a real gas at a solid
surface: the attractive forces of the solid cause an increased concentration (density) of the gas in the region near the surface, 
a measureable phenomenon called "adsorption". Thermodynamics requires that, at equilibrium at a constant temperature and 

Poor MKL Dfti complex to complex performance


I'm new to MIC programming and trying to get a grip on how to do things with the beast. I stumbled accros very bad FFT performance (using a matrix size often used at our institution) for dfti complex to complex transforms. In the following. no OMP, KMP, MKL variables are set, except when stated. Setting the number of threads or specifying the placement does not change much for this comparison: The mic is much slower than the host!

Any hints how to improve the situation?



Suscribirse a Arquitectura Intel® para muchos núcleos integrados