Opencl 1.2 ?

Opencl 1.2 ?

Mohamed Amine BERGACH's picture

Hi every one,Can you tell me if the new Opencl 1.2 specification will be implemented very soon in the next version of intel Opencl SDK ? have you any indication ?

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Arnon Peleg (Intel)'s picture

Hi,

Intel had significant impact
on the OpenCL 1.2 spec as you can read in the Khronos
press release
. However, we are not using this forum to discuss unannounced products.

Nevertheless, we will be
happy to get your inputs on what new features in the Spec are valuable for you.

Arnon

Mohamed Amine BERGACH's picture

Yes, i had seen that Intel is one of the major contributor to OpenCl specs.For me, OpenCl 1.1 doesn't give me enough flexibility to use my hardware , and with OCL 1.2 we now have that feature (Device partionning) , that implies more freedom , that implies more optimisation and best software.I'm waiting for the next intel implementation of OCL 1.2 !Mohamed

Hello Mohamed,

Please note the existing version of the Intel OpenCL SDK already supports device partitioning as an EXT extension. Have you tried it and found it insufficient? OpenCL 1.2 doesn't add much beyond the EXT extension of OpenCL 1.1 for Device Fission, so if the current implementation is lacking, we'd like to hear so we can target your needs in a future release.

Thanks,
Doron Singer

Hi,How about the variable length array in kernel? I think it is the major obstacle for those algorithms which don't have fixed number results. For example, the generation of streamlines. Is it possible to be supported in the future or provide some mechanisms to help developer? Thanks!

There's a mechanism in CUDA to do something like this (malloc and free in a __device__ method, provided you have set aside a specific amount of heap memory for the kernel call), and for OpenCL you can pass in a large buffer as an extra argument as well as its int elementCount and a __global volatile int* elementsUsed or counter_t elementsUsed and manage the memory in it manually using atomic operations to manipulate the next available index into the buffer. On any platform where you could use global atomic operations to do such a thing, you could implement your own malloc / free using said operations. It's a shame it's not part of the OpenCL spec, but there's plenty of time between now and when OpenCL 2.0 comes out to suggest it, and plenty of time for companies to implement their own extensions for providing malloc / free support to kernels and the mechanism to query the maximum and the required size of the dynamic heap for a kernel and context. Not all OpenCL devices would be able to support doing this kind of thing efficiently, but CPUs most certainly would and already do.

The pre-allocated buffer may be the only approach for the problem now. But it means to waste memory for many cases. I hope to have the flexible allocate/free methods in kernel which is not an easy work for so many different hardwares. So I don't expect to see them untill OpenCL 2.x.

For the presentation of SC11, the OpenCL 2.0 is expected to have unified virtual memory model, more flexible parallel model and a low level kernel backend for the high level languages binding. These are all exciting functions and OpenCL may reach real mature status at that time. I can't wait I hope it is the next version!

Login to leave a comment.