Linux Iris Kernel Launch Overhead


Again, I've been trying to characterize when it makes sense to off load computation from CPU to IGP (i7-5775c CPU vs Iris Pro IGP). I noticed that for very simple kernels (e.g. a single fma, or min/max operation) that the CPU would greatly outperform IGP by up to 50%, and upon investigating it seems that kernel launch overhead has a lot to do with it. Some results to explain:

FMA Kernel (using FMA_LOOP = 1):

Uninstalling OpenCL SDK 1.5 on Windows 10 after upgrade from Windows 7

I installed OpenCL SDK 1.5 when my computer had Windows 7 Home. It is now upgraded to Windows 10 Home. When I try to install "OpenCL runtime for Intel Core and Xeon Processors" I get the error message "An out-of-date version of the OpenCL runtime package is already installed on your machine. Remove any previous installation before you continue."

Linux. Need to login locally first before OpenCL works using remote connection (SSH or VNC)

Need to login locally first before OpenCL works using remote connection (SSH or VNC)

Installed CentOS 7.2 (and 7.1) on BDW processor. I did the refresh install to get rid of other factors. I also installed on another BDW desktop.However, the results are consistent....

Has anyone encountered this issue before ?



Intel SDK unhappy with Intel OpenCL driver?

Hello, I performed these steps:

1. Buy a Dell Precision M4800

2. Install CentOS 7


lspci | grep VGA | grep Intel
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)

4. Execute these instructions in accordance with the Intel OpenCL Driver and SDK installation guides:

Iris 6100/Linux: can't get asynchronous kernel execution

I can't seem to get the GPU to run OpenCL kernels asynchronously/in parallel, which is crucial for my use case. Without this I can't make full use of the GPU's compute resources. I use local memory so each workgroup is confined to a single subslice, and the number of workgroups in each enqueued command isn't sufficient to fill more than one subslice of the GPU anyway.

installation problems on desktop

Hi, I want to test how the GPU processing is faster than common things so now I'm trying to setup the environment for OpenCL and OpenCV.

however, i got an issue for installation for OpenCL SDK on my computer.

My CPU is i5-4590 ,Intel graphics HD 4600 . OS is Windows7 32bit.

As I know, the SDK supports OpenCL 1.2, 32-bit Operation system before 2016 sdk but now it's not available.

Could you please suggest any solution for it? or .. is there any routes to get past versions.?

Which version of visual studio do I need?


Latest OpenCL SDK seems to work best as VS plugin because, when I open up code-builder gui, the first thing it says is "deprecated".

However, which version of visual studio do I need?  Can I use express version of visual studio? I tried visual studio express 2012, it does not work. I cannot see the popup for converting a project to an opencl project! 

How to reinstall Open CL for Intel Core

My system :

Processor    Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz, 3301 Mhz, 6 Core(s), 12 Logical Processor(s)

Windows 10 64 bit Pro.

I had an install of the Intel Core OpenCL drivers that I was using with Luxrender - the graphics software.

I needed to uninstall it briefly due to some problems with old AMD drivers causing issues - AMD had not cleanly uninstalled all of it's gubbins - including their OpenCL driver.

Now if I go to 'Apps' in Windows 10 I can see that the Intel OpenCL driver is not present as expected.

Questions on loop and math function overhead on Intel HD GPU

I have an Intel HD 4600 gpu and noticed some performance discrepancies when running a microbenchmark with a significant number of loops for built-in math functions (arithmetic operators are fine). The results are compared against results from running the microbenchmark on the cpu, and running the standard C math functions in a loop (vectorisation and optimizations are avoided). So my question is this; is there a big loop or math function overhead when executing a kernel on an Intel HD GPU?

i7-5775c Iris Pro vs CPU performance


I'm benchmarking i7-5775c's 4 CPU cores against its Iris Pro 6200 for simple OpenCL kernels. Guess you can say I want to know when it makes sense to off-load computation onto the IGP. One experiment involves each thread executing many FMA operations on a single input element to measure computational speed. I'm surprised to see the IGP outperform the CPU by nearly 9x, and by 18x with hyper-threading disabled:

OpenCL kernel:
void kernel fmaKernel(global float * out){

  float sum = out[get_global_id(0)];

订阅 OpenCL*