Intel LLVM Optimizer Optimization Flags

Hi all,

I'm using the Intel offline compiler and the LLVM-based optimizer with Intel specific optimizations (oclopt). For the optimizer I have used regular O optimization levels (O1, O2, O3) which included many optimization in two passes. As there are plenty of individual optimizations available, I'm interested to know which optimization flags have potentially big impact on performance.

Can you please suggest me some optimizations?

Best regards,

Same kernel but huge performance difference under linux and windows


I have managed to run my kernel on iGPU under Linux and Windows.

Officially linux does not support to run kernel on iGPU but an OpenCL source project "beignet" come to help.

So following is the performance result for my kernel (deblocking filter in HEVC), the performance (time in seconds) was not obtained by binding event to kernel launching in OpenCL as it also depends on the OpenCL runtime implementation under windows and linux, instead, it was obtained by the host side CPU profiling utilities. 

                      H2D     Kernel     D2H

Memory Leak in Windows CPU OCL 1.2/2.0, not so much GPU 1.2

Hello world,

I'm developing an asynchronous Windows application and have noticed a strange loss of system memory. My application internally tracks memory usage, and when not using OpenCL at all it matches what is reported by the system through taskmgr. What's curious is the memory leak is more or less depending on what OpenCL version and device I use. Summarizing what taskmgr reports:

No OpenCL (vanilla C code) - ~8MB
OpenCL 2.0 Experimental CPU ~ 1.2 GB
OpenCL 1.2 CPU ~ 350 MB
OpenCL 1.2 GPU (HD 4600) ~ 40 MB

Which fine grain SVM features are supported in the current Gen8 driver?

I don't have Broadwell hardware in front of me yet so can you tell me which fine-grain SVM capabilities are supported in the latest driver on Gen8 devices?  Just FINE_GRAIN_BUFFER?

If FINE_GRAIN_SYSTEM is supported then can an 8-16GB host address space be shared?

The OpenCL 2.0 SVM article does a nice job summarizing the capability bits.  Can you list which are supported in the .4080 driver and which might eventually be supported?

Any way to coax HD Graphics IGP into SIMD4 or SIMD4x2 mode?

I asked a similar question last year and want to know if there is any way to coax the compiler into mapping "vectorized" code onto the IGP?

More specifically, I'd like to launch a workgroup where each work item is a SIMD4 or SIMD4x2 vector and the number of vector registers per work item might approach 128.

How to build native OpenCL kernel by assembly and load this kernel


I want to learn how to build native OpenCL kernel by assembly and load this kernel. I am already able to generate assembly by the OpenCL kernel and compile it then:

ioc64 -cmd=build -device=co -asm=file.s
icc -mmic -c file.s -o kernel

But how do I load this kernel into OpenCL runtime. My current approach did not work:

OpenCL 1.2 on Intel Core T8300 with Linux


i installed "intel_code_builder_for_opencl_2015_ubuntu_5.0.0.43_x64.tgz" on Ubuntu Linux.

My CPU is an Intel(R) Core(TM)2 Duo CPU T8300.

I cannot find out whether the drivers support this CPU or not based on the documentation. It only states Intel® CoreTM Processors.

I successfully get a plattform id and device id. I also can query the device capabilities with clGetDeviceInfo. Also clGetDeviceInfo reports true for CL_DEVICE_AVAILABLE.

But clCreateContext fails with CL_DEVICE_NOT_AVAILABLE.

So is this CPU supported?



OpenCL* abonnieren