fully utilizing a multiprocessor system ?

fully utilizing a multiprocessor system ?


i have a dual Xeon system(each is quad core), i can see that using Intel OpenCL i can only use a single processor at a time, and though querying functions indicate 8 compute units, actually only a max. of 4 can be used per kernel invocation

any suggestions how to use the two processors simultaneously for the same kernel ?

11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.


Could you elaborate a bit? The behaviour you're describing isn't what's expected on such machines. How many device IDs are returned by the call to clGetDeviceIDs with the Intel platform ID? (expected one) How do you measure utilization? (the best way is probably to use Intel GPA)

Doron Singer

i'm very sorry for late response (forum doesn't send alerting mail for replies)

i get 1 device with 8 compute units ( on a windows 7 x64 machine)
i measure utilization through windows task manager (maximum cpu usage is 50%, i.e. 4 processors used)
also there absoloutelty no performance change between when i use setProcesserAffinity using the first 4 processors and any greate number (here 5-8)

Thanks for your reply. In order for us to try and reproduce this issue, could you provide as accurate a spec of your HW setup as possible? Windows 7 version (enterprise/ultimate/etc), which CPU, etc?

Another thing you might want to try (though it requires a bit of work) is to use the device fission extension (clCreateSubdevicesEXT) to create two device IDs identifying the two NUMA nodes in your machine, and submitting jobs to both of them (via two command queues) simultaneously -- does this break the 50% utilization barrier?

Doron Singer

Windows 7 Ultimate x64
Dual Intel Xeon E5500
12GB Ram
(a DELL Precison system)

i also tried testing on another system, with 4 Xeon processors, initially it had a windows server 2003, the problem was that Intel OpenCL SDK don't work on it, ironically the system also refuses installing Windows 7 x64 ( a bug that microsoft knows but didn't solve yet), lastly i tried a 32bit Windows 7 Professional, it worked but recognized only 2 processors ( a limitation imposed by microsoft), but also suffered the same 50% usage problem but on a 32B system this time.
another question, is there any plans for intel to support windows server ?

i will try the fission extension as fast as possible (though even if it worked, i need to use them as a single device not two)

Our SDK designed to work on any Windows version based on NT 6.0 and above. That includes Windows Vista, Windows 7 and Windows Server 2008.
Windows Server 2003 is based on NT version 5.2 (same as Windows XP) which is not supported by this version of the SDK.
For more information on our supported platforms, please visit theproduct's release notes page on our web-site: http://software.intel.com/en-us/articles/opencl-release-notes/

Uri Levy

Hello again,

So far we've been unable to reproduce this issue, admittedly on somewhat different hardware setups. Could you try and provide us with a reproduction so we can ensure we're trying the right thing to reproduce the issue?

Doron Singer

Hi, just a thought on the issue: Does your specific Xeon CPU support Hyperthreading (HT)? It may be that the SDK is not using the HT ability of the CPU and running on the physical cores only (of which there may be only 4). As far as I understand HT should report double the number of physical cores on the system. If I am wrong, then disregard this comment.

Hello Lee,

It's a good thought, but the SDK is implemented to take advantage of all Intel CPU features, including Hyperthreading technology. Full utilization is expected on supported processors, some of which support Hyperthreading.


Hi, having utilization of 50% might be related to the fact tat your app is doing some other job beyound computing with OCL. E.g. reading data from files, rendering with DX, etc. In those and many other cases an app can be waiting on some OS/drivers sync routine for significant portion of the time.

With the conventional "Windows Task Manager" that you are currently using for checking the CPUs utilization, you can check the amount of time which your app spend deep in OS via "Options->Show Kernel Times".

Also if your tasks are really lightweight (e.g. utilize CPUs for 100% but just for a small period), then resolution of the "Windows Task Manager" might be insufficient to capture the load distribution over time.
The best way is to use OCL perf counters to collect the time for OCL kernel and to double-check whether it is some significant portion of the total wall-clock time.
Refer to http://software.intel.com/en-us/articles/performance-debugging-intro/

I would also suggest to increase the OCL load by proccesing more data etc to check the scaling.

much thanks for all the replies
problem was magically solved with SDK 1.1, and usage went normally to 99%

Leave a Comment

Please sign in to add a comment. Not a member? Join today