In this situation, it is important to compile the kernel with the
compiler command option to enable the
Intel® oneAPI DPC++/C++ Compiler
to rearrange operations in a way that exposes the accumulation. If you do not compile the kernel with
, the resulting accumulator structure has a high initiation interval (II). II is the number of cycles between launching successive loop iterations. The higher the II value, the longer the accumulator structure must wait before it can process the next loop iteration.