Developer Guide

Contents

Additional Recommendations

Optimizing memory accesses in your DPC++ kernels can improve overall kernel performance. Consider implementing the following techniques for optimizing memory accesses:
  • Avoid designing systems where one kernel writes an intermediate result to global memory and another kernel reads this data back from global memory. Instead, implement a DPC++ pipe (described in Pipes) between the producer and consumer kernels for direct data transfer. Alternatively, you can merge both kernels into a single larger kernel and use helper functions to logically separate the two original kernels.
  • The
    Intel® oneAPI
    DPC++/C++
    Compiler
    implements local memory in FPGAs differently than in GPUs. If your DPC++ kernel contains code to avoid GPU-specific local memory bank conflicts, remove that code because the compiler generates hardware that avoids local memory bank conflicts automatically whenever possible.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.