It is often convenient to keep a kernel source same for different devices. On the other hand, it is often important to apply specific optimizations per device.
If you need separate versions of kernels, one way to keep the source code base same, is using the preprocessor to create CPU-specific or GPU-specific optimized versions of the kernels. You can run
clBuildProgramtwice on the same program object, once for CPU with some flag (compiler input) indicating the CPU version, the second time for GPU and corresponding compiler flags. Then, when you create two kernels with
clCreateKernel, the runtime has two different versions for each kernel.
To maintain different versions of a kernel, consider using preprocessor directives over regular control flow, as explained in the “Using Specialization in Branching” section. Kernel prototype (the number of arguments and their types) should be the same for all kernels across all devices; otherwise you might get a