Minimize the Memory Dependencies for Loop Pipelining
- Ensure that theIntel® oneAPI DPC++/C++ Compilerdoes not assume false dependencies.
- When the static memory dependence analysis fails to prove that dependency does not exist, theIntel® oneAPI DPC++/C++ Compilerassumes that a dependency exists and modifies the kernel execution to enforce the dependency. The impact of the dependency enforcement is lower if the memory system is stall-free.
- Write-after-read operations with data dependency on a load-store unit can take just two clock cycles (II=2). Other stall-free scenarios can take up to seven clock cycles.
- TheIntel® oneAPI DPC++/C++ Compilercan fully resolve the read-after-write (control dependency) operation.
- Override the static memory dependence analysis by adding the line[[intelfpga::ivdep]]before the loop in your kernel code if you are sure that it carries no dependences. For more information, refer to ivdep Attribute