Question regarding programming the Intel HD 4600 GPU on Haswell

Question regarding programming the Intel HD 4600 GPU on Haswell

Hi! I would like to know if there is a possibility to synchronize the threads on the gpu with the threads on the cpu.

To be more specific. I have a program that has two threads. Both threads will be glued to different cpu cores, however one of the threads will just run on the cpu side, whilst the second term will offload it's work to the gpu.  I would like to know if there is a mechanism that could be put in place to have a barrier like synchronization between the cpu thread and the gpu threads?

And as a side note, when compiling with the intel compiler a code meant for gpu offloading I got the following error:

catastrophic error: Can't deduce surface for instrinsic _sfiload_si32.

Can someone please tell me what that means.

Thank you, very much.

Thom Popovici

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Popovici

Regarding the synchronization between CPU and GPU threads, there is currently no explicit means. However some simple cases may probably be addressed by the current simple syntax. I mean, #pragma offload is synchronous wrt the CPU thread which uses it, so, one can do the following:
1. spawn another CPU thread to do the CPU work (e.g. _cilk_spawn)
2. Run #pragma offload in the current thread
3. After #pragma offload is complete, meaning the GPU work is also complete, wait for completion of the spawned CPU thread (_cilk_sync or implicit synchronization at the end of syntax block {})

Regarding the error "catastrophic error: Can't deduce surface for intrinsic _sfiload_si32":

Most likely this results from some unsupported pointer operations, e.g. use of pointers to pointers, or complicated pointer arithmetic, which does not allow to trace a pointer to any pointer typed argument of a kernel. Can you please share a testcase which reproduces this error so that I can look into it and also work with the development team.

Thanks and Regards


I see. Thanks a lot. I thought maybe there was some sort of synchronization between the GPU and CPU, because I can have a synchronization at the CPU level, but that would mean offloading a lot, because I have loops such as:

loop (1 < i < n)


loop(1 < i < n)

And I have some more constraints at the code level. Anyways, thanks alot for your answer.


Follow up queston. I managed to redo my algorithms and found a way to use offload, but now when I am compiling I am getting a very weird result. I have attached a picture. 



Downloadimage/png error.png3.67 MB

Hi Thom

Is it possible to attach that program or a minimal testcase which reproduces this error.

Thanks and Regards

Leave a Comment

Please sign in to add a comment. Not a member? Join today