OpenCL*

Intel provides new Integrated OpenCL development experience

OpenCL support at Intel is now going mainstream with full integration into Intel’s portfolio of software development suites. The Intel® SDK for OpenCL™ Applications features are now integrated into various development tools under a new name of OpenCL™ Code Builder.

The different solutions are tailored to the target development environments:

Apparent memory leak and performance problems

Dear all,

my company is developing a scientific application with ~30000 registered users. We encountered some problems with OpenCL support in Windows 8.1, using the latest driver 10.18.14.4080, and testing with the built-in GPU of a Core i7 4770, the application is 32bit.

1) A memory leak-like issue: The application has a non-performance critical loop which looks like..

- Create kernels
for (i=0;i<100;i++)
{ - Create OpenCL buffers
  - Run kernels
  - Free OpenCL buffers }

Does INDE API Trace feature create another OpenCL platform?

Hi,

INDE API trace was working wonderfully for a few days and suddenly stopped returning any trace information. I started investigating and noticed that, when enabled, another Intel OpenCL platform shows up at runtime when I query all available platforms. Using this newly created OpenCL platform, again that only shows up when API trace is enabled, allows me to get the trace results that I was looking for. Why it worked two days ago without any adjustments to my code is still a mystery to me.

Broadwell HD 6000 significantly more efficient than Haswell HD 4600

I'm seeing a solid per-EU performance improvement on a pure integer math kernel.  

I'm comparing a 950 MHz 48 EU HD 6000 vs. a 1200 MHz 20 EU HD 4600.  Both have similar DDR3 bandwidth.

Adjusted for clock speed, the kernel shows a 60% boost in throughput per EU.  Without adjusting for clock speed, the HD 6000 still shows +26% over the HD 4600.

The improved integer throughput is a nice feature!

OpenCL Code Builder Deep Analysis error: "Cannot return a color for more than 72"

Deep Analysis returns the following error and shows an entirely blank "Execution Duration" tab:

The remaining tabs are populated.

I'm running a workgroup of 2688/224 global/local items which is one SIMD8 per hardware thread on an HD6000.

Is clBuildProgram needed in conjunction with clCreateProgramFromBinary?

Hi,

While trying to develop a standalone for a prior question, I noticed that offline compilation seems to behave differently for CPU and GPU. Per the OpenCL spec, my understanding is that I should be able to reuse compiled kernels (either through ioc32/64 or clCreateProgramFromSource/clBuild). When using a GPU device I can load said precompiled kernel through clCreateProgramFromBinary and be ready to use it. CPU, however, requires me to call clBuild yet again, which from a performance standpoint defeats the purpose of precompiling my kernels.

Intel LLVM Optimizer Optimization Flags

Hi all,

I'm using the Intel offline compiler and the LLVM-based optimizer with Intel specific optimizations (oclopt). For the optimizer I have used regular O optimization levels (O1, O2, O3) which included many optimization in two passes. As there are plenty of individual optimizations available, I'm interested to know which optimization flags have potentially big impact on performance.

Can you please suggest me some optimizations?

Best regards,
Robert

Intel LLVM Optimizer Optimization Flags

Hi all,

I'm using the Intel offline compiler and the LLVM-based optimizer with Intel specific optimizations (oclopt). For the optimizer I have used regular O optimization levels (O1, O2, O3) which included many optimization in two passes. As there are plenty of individual optimizations available, I'm interested to know which optimization flags have potentially big impact on performance.

Can you please suggest me some optimizations?

Best regards,
Robert

Intel LLVM Optimizer Optimization Flags

Hi all,

I'm using the Intel offline compiler and the LLVM-based optimizer with Intel specific optimizations (oclopt). For the optimizer I have used regular O optimization levels (O1, O2, O3) which included many optimization in two passes. As there are plenty of individual optimizations available, I'm interested to know which optimization flags have potentially big impact on performance.

Can you please suggest me some optimizations?

Best regards,
Robert

Intel LLVM Optimizer Optimization Flags

Hi all,

I'm using the Intel offline compiler and the LLVM-based optimizer with Intel specific optimizations (oclopt). For the optimizer I have used regular O optimization levels (O1, O2, O3) which included many optimization in two passes. As there are plenty of individual optimizations available, I'm interested to know which optimization flags have potentially big impact on performance.

Can you please suggest me some optimizations?

Best regards,
Robert

Subscribe to OpenCL*