Some suggestions!

Some suggestions!

Hi,Some suggestions for improving this good release:For completeness:*Add 3d_image_writes support as AMD GPU backend supports it and I have some demo using it..*Add D3D10 interop: cl_khr_d3d10_sharing similar to OpenGL interop so some Nvidia/AMD samples work too..comparing to AMD:*Addcl_ext_device_fission so we can expose multiple concurrent kernels,etc..more ambitious:Add next-gen computing features (as featured in CUDA 3.x) :*(support for no inlined functions with stack) brings Function pointers andRecursion : believe or not but Nvidia OCL GPU backend supports at least recursion! and function pointers fails only when building (also GPU ocelot cpu backend (PTX->LLVM) supports it right now!)*Similar to printf expose malloc and free (featured new in CUDA 3.2)Also seems AMD is working on some C++ support (templatized kernels)What do you think?Thank.

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Another one:

*Allow asm("") function being able to insert x86 assembly code in kernels

CUDA allows asm function containing PTX code inside CUDA device functions..

Best Reply

We would like to thank you for your suggestions.
Some of these suggestions have been raised internally as well, and are considered for the next versions of the SDK.

About the specific proposal to allow asm functions accessed directly inside kernels, I do not believe that we will want to go in this direction. Intel's direction is to promotethe cross-device approach of OpenCL, and this proposal goes against it. The preferred direction is improving the compiler, making sure that the mapping to assembler instructions is efficient. I believe that the additions made to the OCL C language improve the complier's ability to reach this goal.
However, the direction of adding new built-in functions, which map well to SSE instructions, is interesting. We do see cases where a code sequence can be efficiently replaced by a call to SSE instruction - and the method that we prefer is to expose it as a built-in function. this preserves the approach of C language, and is also forward compatible - on future ISA, this built-in can be replaced by the JIT complier in a new instruction.

Hi,thanks for your insight.. as you say perhaps asm("") is not a good approach but I think the others are interesting still! Really waiting to see how this excellent SDK evolves!Thanks.

Leave a Comment

Please sign in to add a comment. Not a member? Join today