Vectorization

IPPM WarpAffine Center

Hi experts,

i want to perform an affine warp of my image. Is there a function which supports warping in respect to the center of the image (and not to the top-left origin of the ROI)? I tried the ippiWarpAffineLinear_32f_C1R for my case but without success.

Any idea? An example would be great :)!

CGEMM performance strangeness on Haswell CPUs vs. Sandy Bridge

Hi All

I have investigated the performance of the CGEMM algorithm using both my own sandy bridge CPU and my colleagues newer computer with a Haswell cpu. The calculation is measured as the number of complex multiply accumulate operations per second it can perform, here denoted as CMacs. I don't use any scaling and I don't add the the previous matrix, so I only calculate C = A * B.

The setup:

The number of Rows in A = 2^16

Number of columns in A = 16

Number of columns in B = 256

GCMacs = A_r * A_c * B_c / time * 1e-9

Single Entry and Single Exit Criteria for loop vectorization.

I was reading A guide to vectorization with Intel C++ compilershttps://software.intel.com/sites/default/files/8c/a9/CompilerAutovectori...  

I am referring to Single Entry and Single Exit Criteria Page No 8. I have specified two options a) Break b) Continue

A) Break

void no_vec(float a[], float b[], float c[])
{
    int i = 0;
    while(i < 100)
    {
        a[i] = b[i] * c[i];

Offload with persistent MIC buffer: are global pointers required?

We have been through that once, but here we go again, because latest results confuse me. My question is: in order to re-use a previously allocated memory buffer on the coprocessor, is the programmer required to supply a global pointer with attribute((target(mic))) in pragma offload?

The reason for this question is that I observe that global variables work in all cases, but local variables work in all cases except one (ouch!). So either it is a bug in the compiler or COI, or it a sign that one programming practice is better than another.

Xeon Phi 7120P always runs at lowest frequency

I recently installed one 7120P in one of my servers. It seems working fine, but I noticed that it always runs at the lowest available frequency. Even I am running the benchmark application coming with intel compiler, the frequency stays at 0.57GHz.

Any idea about this?

Here is some information about my machine

Expected performance gain ... 5960X vs Xeon Phi?

Hello... 
I am a retired theoretical physical chemist with a long association with computers and computing.
As briefly as possible, my interests are in the behavior of fluids at a phase boundary, such as a real gas at a solid
surface: the attractive forces of the solid cause an increased concentration (density) of the gas in the region near the surface, 
a measureable phenomenon called "adsorption". Thermodynamics requires that, at equilibrium at a constant temperature and 

Poor MKL Dfti complex to complex performance

Hello,

I'm new to MIC programming and trying to get a grip on how to do things with the beast. I stumbled accros very bad FFT performance (using a matrix size often used at our institution) for dfti complex to complex transforms. In the following. no OMP, KMP, MKL variables are set, except when stated. Setting the number of threads or specifying the placement does not change much for this comparison: The mic is much slower than the host!

Any hints how to improve the situation?

Sincerely,

HC

Subscribe to Vectorization