Developer Guide

Contents

Specify a Work-Group Size

Specify a maximum or the required work-group size whenever possible. The 
Intel® oneAPI
DPC++/C++
Compiler
relies on this specification to optimize hardware use of the DPC++ kernel without involving excess logic.
  • If you do not specify the
    [[intel::max_work_group_size(Z, Y, X)]]
    or 
    [[cl::reqd_work_group_size(Z, Y, X)]]
     attribute in your kernel, the work-group size assumes a default value depending on compilation time and runtime constraints.
  • If your kernel contains a barrier, the
    Intel® oneAPI
    DPC++/C++
    Compiler
    sets a default maximum scalarized work-group size of 128 work-items.
  • If your kernel does not query any DPC++ intrinsic that allow different threads to behave differently (that is, local or global thread IDs, or work-group ID), the
    Intel® oneAPI
    DPC++/C++
    Compiler
    infers a single-threaded execution mode and sets the maximum work-group size to
    (1, 1, 1)
    . In this case, the DPC++ runtime also enforces a global enqueue size of
    (1, 1, 1)
    , and loop pipelining optimizations are enabled within the
    Intel® oneAPI
    DPC++/C++
    Compiler
    .
To specify the work-group size, modify your kernel code in the following manner:
  • To specify the maximum number of work-items that the compiler provisions for a work-group in a kernel, insert the
    [[intel::max_work_group_size(Z, Y, X)]]
    attribute in your kernel source code.
    For example:
    constexpr unsigned MAX_WG_SIZE = 4; ... cgh.parallel_for<class kernelCompute>( nd_range<1>(range<1>(N), range<1>(wg_size)), [=] (nd_item<id> it) [[intel::max_work_group_size(1, 1, MAX_WG_SIZE)]] { auto gid = it.get_global_id(0); accessorRes[gid] = accessorIdx[gid] * 2; }
  • To specify the required number of work-items that the
    Intel® oneAPI
    DPC++/C++
    Compiler
    provisions for a work-group in a kernel, insert the
    [[cl::reqd_work_group_size(Z, Y, X)]]
    attribute in your kernel source code.
    For example:
    [[cl::reqd_work_group_size(1, 1, 64)]] void sum (cl::sycl::nd_item<1> item, accessor<float, access::mode::read, access:: target::global_buffer> a, accessor<int, access::mode::read, access:: target::global_buffer> b, accessor<int, access::mode::read, access:: target::global_buffer> answer) { uint gid = item.get_global_id(0); answer[gid] = a[gid] + b[gid]; }

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.