User Guide

  • 2021.2
  • 03/26/2021
  • Public Content
Contents

Invalid Kernel Argument Size

Occurs when the size of kernel argument exceeds the amount of registers available for argument storage.
Problem type: Memory leak
ID
Code Location
Description
1
Allocation site
Represents source location of passing arguments from host to a kernel.
By default, kernel arguments are promoted to registers if possible. On each device, the amount of registers available for kernel arguments may vary (usually 25-50%). The arguments outside this limit are cut off, which results in logical errors in the kernel.
In OpenCL™, arguments are specified directly using the
clSetKernelArg
function. In this case, the problem is more obvious and may appear if a large amount of arguments or wide structures is used.
In Data Parallel (DPC++), arguments may be specified implicitly using lambda capture list. If the list is specified as
[=]
or as
[&]
, all used arguments are captured. In this case, a large amount of arguments, wide structures or classes may be passed to a kernel accidentally.
In this diagnostic, Intel® Inspector displays the total size kernel arguments (in bytes) vs the device limit.
DPC++ Example
const int N = 1000; struct Data { double numbers[N]; }; Data data; queue.submit([&](cl::sycl::handler &cgh) { cgh.parallel_for<class my_task>(cl::sycl::range<1> { N }, [=](cl::sycl::id<1> idx) { deviceData[0] += data.numbers[idx]; // Implicit usage of Data structure from host }); }); queue.wait(); // The structure contains 1000 doubles x 8 bytes = 8000 bytes of data > available kernel arguments limit.

Possible Correction Strategies

To avoid the problem, use the following hints:
  • Reduce the number of arguments used if possible.
  • Avoid passing complex structures to a kernel. Consider passing separate fields instead of a complete data structure.
  • Narrow the type of data. Consider using
    float
    instead of
    double
    ,
    int
    instead of
    long
    , etc.
  • In a DPC++ program, specify lambda capture list explicitly instead of using a default capture list specified by
    [=]
    or
    [&]
    .

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.