Programming Guide

Contents

FPGA-CPU Interaction

One of the main influences on the overall performance of an FPGA design is how kernels executing on the FPGA interact with the host on the CPU.

Host and Kernel Interaction

FPGA devices typically communicate with the host (CPU) via PCIe.
CPU with system memory connected to FPGA with private memory via PCIe link
This is an important factor influencing the performance of SYCL programs targeting FPGAs. Furthermore, the first time a particular DPC++ program is run, the FPGA must be configured with its hardware bitstream, and this may require several seconds.

Data Transfer

Typically, the FPGA board has its own private DDR memory on which it primarily operates. The CPU must bulk transfer or dynamic memory access (DMA) all data that the kernel needs to access into the FPGA’s local DDR memory. After the kernel completes its operations, results must be transferred over DMA back to the CPU. The transfer speed is bound by the PCIe link itself, as well as the efficiency of the DMA solution. For example, the Intel® PAC with Intel® Arria® 10 GX FPGA has a PCIe Gen 3 x 8 link, and transfers are typically limited to 6-7 GB/s.
The following are the techniques to manage these data transfer times:
  • DPC++ allows buffers to be tagged as read-only or write-only, which allows some unnecessary transfers to be eliminated.
  • Improve the overall system efficiency by maximizing the number of concurrent operations. Since PCIe supports simultaneous transfers in opposite directions and PCIe transfers do not interfere with kernel execution, techniques such as double buffering can be applied. Refer to the Double Buffering Host Utilizing Kernel Invocation Queue topic in the
    Intel oneAPI DPC++ FPGA Optimization Guide
    and the double_buffering tutorial for additional information about these techniques.
  • Improve data transfer throughput by prepinning system memory on board variants that support Restricted USM. Refer to the Prepinning topic in the
    Intel® oneAPI DPC++ FPGA Optimization Guide
    for additional information.

Configuration Time

You must program the hardware bitstream on the FPGA device in a process called configuration. Configuration is a lengthy operation requiring several seconds of communication with the FPGA device. The SYCL runtime manages configuration for you, automatically. The runtime decides when the configuration occurs. For example, configuration might be triggered when a kernel is first launched, but subsequent launches of the same kernel may not trigger configuration since the bitstream has not changed. Therefore, during development, Intel® recommends to time the execution of the kernel after the FPGA has been configured, for example, by performing a warm-up execution of the kernel before timing kernel execution. You must remove this warm-up execution in the production code.

Multiple Kernel Invocations

If a SYCL program submits the same kernel to a SYCL queue multiple times (for example, by calling
single_task
within a loop), only one kernel invocation is active at a time. Each subsequent invocation of the kernel waits for the previous run of the kernel to complete.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.