Developer Guide

Contents

Specify Number of SIMD Work-Items

You have the option to increase the data-processing efficiency of a SYCL kernel by executing multiple work-items in a single instruction multiple data (SIMD) manner without manually vectorizing your kernel code.
Specify the number of work-items within a work-group that the
Intel® oneAPI
DPC++/C++
Compiler
should execute in a SIMD or vectorized manner.
Important:
Introduce the
[[intel::num_simd_work_items(N)]]
attribute in conjunction with the
[[cl::reqd_work_group_size(Z, Y, X)]]
attribute. The
[[intel::num_simd_work_items(N)]]
attribute you specify must evenly divide the work-group size you specify for the
[[cl::reqd_work_group_size(Z, Y, X)]]
attribute.
To specify the number of SIMD work-items in a work-group, insert the
[[intel::num_simd_work_items(N)]]
attribute in the kernel source code.
Consider the following example:
cgh.parallel_for<class kernelComputeSIMD>( nd_range<1>(range<1>(N), range<1>(REQD_WORK_GROUP_SIZE)), [=] (nd_item<id> it) [[intel::num_simd_work_items(NUM_SIMD_WORK_ITEMS), cl::reqd_work_group_size(1, 1, REQD_WORK_GROUP_SIZE)]] { auto gid = it.get_global_id(0); accessorRes[gid] = cl::sycl::sqrt(accessorIdx[gid]); }
Always use the
[[intel::num_simd_work_items(N)]]
attribute with
[[cl::reqd_work_group_size(Z, Y, X)]]
, and
REQD_WORK_GROUP_SIZE
%
NUM_SIMD_WORK_ITEMS
must be 0.
For additional information about
[[cl::reqd_work_group_size(Z, Y, X)]]
attribute, refer to Specify a Work-Group Size.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.