Developer Guide

Specialization Constants

DPC++ has a feature called
specialization constants
that can explicitly trigger JIT compilation to generate code from the intermediate SPIR-V code based on the run-time values of these specialization constants. These JIT compilation actions are done during the execution of the program when the values of these constants are known. This is different from the JIT compilation, which is triggered based on the options provided to
-fsycl-device-code-split
.
In the example below, the call to
set_spec_constant
binds the value returned by the call to function
get_value
, defined at line 8, to the SYCL program. When the program is initially compiled, this value is not known and so cannot be used for optimizations. At runtime, after function
get_value
is executed, the value is known, so it is used by
build_with_kernel_type
to trigger JIT compilation of the specialized kernel with this value.
#include <CL/sycl.hpp> #include <vector> class specialized_kernel; class runtime_const; // Fetch a value at runtime. float get_value() { return 10; }; int main() { sycl::queue queue; sycl::program program(queue.get_context()); // Create a specialization constant. sycl::ONEAPI::experimental::spec_constant<float, runtime_const> my_constant = program.set_spec_constant<runtime_const>(get_value()); program.build_with_kernel_type<specialized_kernel>(); std::vector<float> vec(1); { sycl::buffer<float> buffer(vec.data(), vec.size()); queue.submit([&](auto &cgh) { sycl::accessor acc(buffer, cgh, sycl::write_only, sycl::noinit); cgh.template single_task<specialized_kernel>( program.get_kernel<specialized_kernel>(), [=]() { acc[0] = my_constant.get(); }); }); } queue.wait_and_throw(); std::cout << vec[0] << std::endl; return 0; }
The specialized kernel at line 24 will eventually become the code shown below
cgh.single_task<specialized_kernel>( program.get_kernel<specialized_kernel>(), [=]() { acc[0] = 10; });
This JIT compilation also has an impact on the amount of time it takes to execute a kernel. This is illustrated by the example below.
#include <CL/sycl.hpp> #include <chrono> #include <vector> class specialized_kernel; class runtime_const; class ker; // Fetch a value at runtime. float get_value() { return 10; }; int main() { sycl::queue queue; sycl::program program(queue.get_context()); // Create a specialization constant. sycl::ONEAPI::experimental::spec_constant<float, runtime_const> my_constant = program.set_spec_constant<runtime_const>(get_value()); auto start = std::chrono::steady_clock::now(); program.build_with_kernel_type<specialized_kernel>(); auto end = std::chrono::steady_clock::now(); std::cout << "specialization took - " << (end - start).count() << " nano-secs\n"; std::vector<float> vec{0, 0, 0, 0, 0}; sycl::buffer<float> buffer1(vec.data(), vec.size()); sycl::buffer<float> buffer2(vec.data(), vec.size()); start = std::chrono::steady_clock::now(); { queue.submit([&](auto &cgh) { sycl::accessor acc(buffer1, cgh, sycl::write_only, sycl::noinit); cgh.template single_task<specialized_kernel>( program.get_kernel<specialized_kernel>(), [=]() { acc[0] = my_constant.get(); }); }); queue.wait_and_throw(); } end = std::chrono::steady_clock::now(); { sycl::host_accessor host_acc(buffer1, sycl::read_only); std::cout << "result1 (c): " << host_acc[0] << " " << host_acc[1] << " " << host_acc[2] << " " << host_acc[3] << " " << host_acc[4] << std::endl; } std::cout << "execution took : " << (end - start).count() << " nano-secs\n"; start = std::chrono::steady_clock::now(); { queue.submit([&](auto &cgh) { sycl::accessor acc(buffer2, cgh, sycl::write_only, sycl::noinit); cgh.single_task([=]() { acc[0] = 20; }); }); queue.wait_and_throw(); } end = std::chrono::steady_clock::now(); { sycl::host_accessor host_acc(buffer2, sycl::read_only); std::cout << "result2 (c): " << host_acc[0] << " " << host_acc[1] << " " << host_acc[2] << " " << host_acc[3] << " " << host_acc[4] << std::endl; } std::cout << "execution took - " << (end - start).count() << " nano-secs\n"; }
Looking the runtimes reported by each of the timing messages it can be seen that the initial translation of the kernel takes a long time, while the actual execution of the JIT-compiled kernel takes less time. The same kernel executed without the specialization constants takes longer time because this kernel will have been JIT-compiled by the runtime before actually executing it.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.