The current process to acquire an OpenCL binary looks something like this:
program = clCreateProgramWithSource(...)
clGetProgramInfo( program, CL_PROGRAM_BINARIES,..., binary )
This binary is then stored somewhere and loaded at application runtime by calling clCreateProgramFromBinary.
However, this is far too slow since the LLVM compiler takes several minutes to generate the machine code from the LLVM bitcode "binary" at runtime. (I have a bunch of kernels)
Is there any way to get clCreateProgramWithSource, clGetProgramInfo, or any other method to output a real machine code binary from an OpenCL program?
I realize the OpenCL spec got it wrong when clGetProgramInfo says "The bits returned can be an implementation-specific intermediate representation (a.k.a. IR) or device specific executable bits or both. The decision on which information is returned in the binary is up to the OpenCL implementation." but it seems like Intel didn't try to fix their mistake.
Since I'm pre-compiling and running on the same machine, it would seem reasonable to include machine code for the architectures that are actually available in addition to whatever intermediate representation Intel feels like using. How often are people cross-compiling on disparate platforms? Even so, the backup IR should work. In fact, cross-compilation doesn't even make sense here because the target platform would need to have the LLVM bitcode compiler.