clenqueueTask uses how many cores??

clenqueueTask uses how many cores??

Аватар пользователя evk8888

hello guys, I am using opencl 1.1 with intel xeon 24 core processor.. When i enqueue a task for execution using clenqueueTask() is it supposed to use a single core for execution??? Thanks

6 сообщений / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя Evgeny Fiksman (Intel)

Hi,

Yes, it will run in a single thread.

According to the OpenCL spec. a task has a global size of (1,1,1), that means single execution item.

Thanks,
Evgeny

Аватар пользователя evk8888

Hello, thanks for your reply.... I have another question.. I do device fission say 4 core machine into 4 sub-devices each with 1 core. I create different queues according to the sub-devices. for example is it possible to use clenqueueReadBuffer() for a cl_mem buffer in the device using the queue of sub-device 1 or 2 or 3 and 4 irrespective of where it was executed.. or it is possible to have a global queue for data transfers to/from the device and separate queues for the sub-devices to executed tasks... will this work.... thanks a lot...

Аватар пользователя Evgeny Fiksman (Intel)

Theoretically it should.
Be aware that as it's stated in the Release Notes the device fission is experimental and you might have inconsistency in your results.

We would be glad to hear your feedback about experience with this feature.

Аватар пользователя Jim Vaughn
Hi,

If you you create different queues for each sub-device you will have to copy them memory to each "device" giving you seperate memory on each device. Also the clenqueueReadbuffer() takes the queue, program and context so for all intents and purposes they are seperate memory. (right?) To be fair the spec forcl_ext_device_fission extension is very low on information on what I see as a complex subject.

Аватар пользователя Evgeny Fiksman (Intel)

Thanks for the good question.

In Intel implementation sub-devices share memory resources of the parent device, the exception is NUMA aware systems wherein implementation may try to locate memory objects on the appropriate NUMA node.

Sub-devices are using separate execution units, in the CPU device those are different HW threads.

According to the spec programs shouldbe compiled separately for each sub-device; however, implementation may have single program for all sub-devices of a parent device.

Evgeny

Зарегистрируйтесь, чтобы оставить комментарий.