Can INT16 achieve 2x throughput compared with INT32 on Broadwell?


According to Gen8.pdf,

'These units can SIMD execute up to four 32-bit floating-point (or integer) operations, or SIMD execute up to eight 16-bit integer or 16-bit floating-point operations.'

It seems that INT16 can achieve 2x peak throughput compared with INT32.     

In Gen8.pdf, the table shows that for HD Graphics 5300, 32b integer IOPS = 192 IOP/cyc.     Then, does it mean 16bit integer IOPS = 192*2 IOP/cyc?

Is my understanding right?


Problems with reduction done in CPU

Hi all. I have been trying to code reductions for CPU and GPU.  The kernels attached below work really

well for Intel GPU's and Nvidia GPU. But, when I compile for CPU (Intel). The results are not consistent.

Sometimes, the result is right sometimes the result is wrong. There are two kernels: reduction_vector

is called many times by the host. When, the global_size is reduced to local_size. I issue complete_vector to finalize

the reduction.


__kernel void reduction_vector(__global int* data, __local int* partial_sums)


Sandy Bridge, INDE OK while IOC64 and runtime fails (W8.1 -I7 2760)


I have a piece of code that runs fine on Ivy bridge and later CPU's. On Sandy Bridge (2760) it will not vectorize and hence it will not perform.

Now, the same code will compile and run if we use your competitors runtime.

Our problem is that the INDE compiler will vectorize it but when we use the IOC64 directly it simply says compile failure. Is it safe to assume that the toolkit is using the same compiler?

Further, the -Scholar mode does not work?

Any ideas?



OpenCL runtime 15.1 for Ubuntu 14.04

I was trying to intall Intel OpenCL Runtime 15.1 for Ubuntu. I have a Xeon E3 Haswell.

Now, release notes clearly state 14.04 *is supported*: , and installation instructions are pretty clear.

BUT, when I try to download the runtime from the official download page

Code Builder missing VS breakpoints

I'm working with code builder. I've been running happily for six months with the 2014 ocl applications SDK under VS2010. I thought I knew all the tricks to ensure kernel breakpoints were hit, but one (large) new project consistently fails to hit breakpoints. I figured I can't go to forums with an old version, so I tried installing INDE with the restricted options for just a Code Builder installation, but that just stripped out my old VS2010 code builder extensions.

OpenCL Code Builder and runtime vs MPSS version


we would like to provide OpenCL support to Intel Core and Xeon processors and Intel Xeon Phi coprocessors on our cluster. On the online documentation I read that "For Intel® Xeon Phi™ coprocessor support, you must install the OpenCL runtime 14.2 here, and the Intel® Manycore Platform Software Stack (Intel® MPSS) 3.3 here".

Assine o OpenCL*