NUMA effects with OpenCL

NUMA effects with OpenCL

Hi Guys,

Recently I am working on the OpenCL and using a dual sockets machine from Intel (X5650). I wonder how I can control the NUMA effects with OpenCL? Do I have any API for it? or it can be handled by the run-time and this factor is hidden by the run-time?



3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello Jianbin,

You can try the following:

  1. Allocate memory yourself, using something like libnuma to ensure it's all allocated on a single socket.
    Make sure to align the memory to the size of the OpenCL data type you intend to use.
  2. Create memory objects using CL_MEM_USE_HOST_PTR to wrap these allocations.
  3. Use clCreateSubdevices to create sub-devices representing the different NUMA nodes. The current version of the SDK doesn't support partitioning by CL_DEVICE_AFFINITY_DOMAIN_NUMA, but you can use the Intel extension CL_DEVICE_PARTITION_BY_NAMES_INTEL to define which cores to assign to which sub-devices, yourself. Read more about it here:

That should allow you to enqueue kernels on a single socket using the appropriate sub-device ID, and you can ensure each kernel operates on memory objects allocated on physical pages from that node.

As an aside, the reason there isn't a more straightforward way to go about things is that our testing showing a relatively low return on investment - the performance impact was negligible thanks to the Intel Quick Path Interconnect technology.

If you try this and find a case where this has a significant impact, please let us know.





Doron Singer (Intel) wrote:

If you try this and find a case where this has a significant impact, please let us know.

Reductions!  As I reported here:

I haven't tested it on other bandwidth bound applications, but I think it's generally applicable.  Thank you, Doron.


Leave a Comment

Please sign in to add a comment. Not a member? Join today