double-buffering optimization can help minimize or remove gaps between consecutive kernels as they wait for host processing to finish. These gaps are minimized or removed by having the host perform processing operations on a second set of buffers while the kernel executes. With this execution order, the host processing is done by the time the next kernel can run, so kernel execution is not held up waiting for the host. For additional information, refer to the FPGA tutorial sample "Double Buffering to Overlap" listed in the
Intel® oneAPI Samples Browser on Linux* or
Intel® oneAPI Samples Browser on Windows*.
In this example, the first three kernels are run without the double-buffer optimization, and the next three are run with it. The kernels were run on an Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA when the intercept layer data was collected. The change made by this optimization can be clearly seen in the Intercept Layer for OpenCL* Applications trace:
Here, the kernel runs named
can be recognized as the bars with the largest execution time (the large orange bars). Double buffering removes the gaps between the kernel executions that can be seen in the top trace image. This optimization improves the throughput of the design.
The Intercept Layer for OpenCL* Applications makes the need for and benefits of the optimization clear. Use the Intercept Layer tool on your designs to identify scenarios where you can apply double buffering (and other optimizations).