I tried to work with subdevices on the CPU. I was assuming that this could help me control the mapping of the kernel to core assignment. To be more precise, I have some host threads that need to give some performance guarantees, and I want to be sure that the OpenCL kernels do not run on the same core. For the host threads I can set the cpu affinity. However, for the OpenCL kernels I cannot. I thought that subdevices could solve that problem in some way.
So, while I was digging into that topic, I have come over some peculiarities.
1. I figured out that the Intel OpenCL runtime creates one thread per CPU core, each of them having set a specific cpu affinity. This can be seen in gdb or htop. It is, however, strange that the device affinity of those threads is not constant for the whole runtime (i.e. it is reset from time to time).
2. I also figured out, that some of those threads seem to be set to the same affinity. This can also be a side effect of the refresh rate in htop, so that I am not able to see when the affinity has changed.
4. Subdevices cannot be created by affinity domain (CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN), although clGetDeviceInfo(...CL_DEVICE_PARTITION_PROPERTIES..) returns so.
5. When subdevices are created using CL_DEVICE_PARTITION_EQUALLY, the number of utilized cores seems to be one less than actually specified (i.e. partition equally to subdevices each having 4 compute units, will actually only use 3 cpu cores).
- Is it possible to set the cpu affinity per subdevice or to a running OpenCL kernel?
- Can you reproduce and explain the behavior above? Does it make sense that two running kernels are sharing one CPU core, even though they are running on a seperate subdevice?
- Linux 3.7.6-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux
- Intel Core i7 2600K
- Intel OpenCL SDK 2013 Build 56860
Attachments: A minimal running code example, some additional information to my system setup.