Developer Guide

Contents

Memory Accesses

Memory access efficiency often dictates the overall performance of your DPC++ kernel. Refer to Memory Types for an introduction to memory accesses.
The pipeline parallel nature of DPC++ execution on FPGA means that memory loads and stores in your DPC++ code compete for access to memory resources (global, local, and private memories). If your DPC++ kernel performs a large number of memory accesses, the compiler must generate arbitration logic to share the available memory bandwidth between memory access sites in your kernel's datapath. If the bandwidth demanded by the datapath exceeds what the memory and arbitration logic can provide, the datapath stalls. This degrades kernel’s throughput because the compute pipeline must wait for a memory access before resuming.
When optimizing your design, it is important to understand whether your DPC++ kernel's throughput is limited by memory accesses (a memory-bound kernel) or by the structure of the kernel datapath (a compute-bound kernel). These situations require different optimization techniques. The following sections discuss memory access optimization in detail.
Consider the following when developing your DPC++ code:
  • The maximum computation bandwidth of an FPGA is much larger than the available global memory bandwidth.
  • The available global memory bandwidth is much smaller than the local and private memory bandwidth.
  • Minimize the number of global memory accesses.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.