Developer Guide

Contents

Document Revision History

Document Revision History for the Intel® oneAPI DPC++ FPGA Optimization Guide
Date
Release Version
Changes
October 2020
2021.1-beta10
  • Added the following new topics:
    • Control Path
    • Load-Store Units
    • Load-Store Unit Styles
    • Load-Store Unit Modifiers
    • Load-Store Unit Controls
    • Annotating USM Pointers
    • Zero-Copy Memory Access
    • Pre-pinning
    • Fusing Adjacent Loops With Unequal Trip Counts (
      -Xsenable-unequal-tc-fusion
      )
    • Disable Automatic Fusion of Loops (
      -Xsdisable-auto-loop-fusion
      )
    • Partitioning Buffers Across Different Memory Types (Heterogeneous Memory)
  • Completely updated all of the FPGA concepts in the
    Introduction
    chapter.
  • Updated the following paths and namespaces:
    • Changed
      CL/sycl/intel
      to
      CL/sycl/INTEL
      .
    • Changed
      intel::fpga_selector
      to
      INTEL::fpga_selector
      .
    • Changed
      intel::lsu
      to
      INTEL::lsu
      .
    • Changed
      intel::fpga_reg
      to
      INTEL::fpga_reg
      .
    • Changed
      intel::pipe
      to
      INTEL::pipe
      .
  • Updated the Area Analysis of System report messages in
    Area Analysis of System
    .
  • Added a note and updated the description in
    unroll Pragma
    and
    ii Attribute
    .
  • Added a note about the SYCL implementation of math functions in
    Data Types and Operations
    .
  • Added description about
    [[intelfpga::ivdep(array)]]
    ,
    [[intelfpga::ivdep(array, safelen)]]
    , and
    [[intelfpga::ivdep(safelen, array)]]
    in
    ivdep
    Attribute
    .
  • Updated the topics
    Measure Kernel Performance
    and
    Instrument the Kernel Pipeline with Performance Counters (
    -Xsprofile
    )
    completely.
September 2020
2021.1-beta09
  • Added the following new topics:
    • Omit Hardware to Support the
      no_global_work_offset
      attribute in
      parallel_for
      Kernels
      .
    • loop_coalesce
      Attribute
    • speculated_iterations
      Attribute
    • disable_loop_pipelining
      Attribute
    • Reduce Area Resource Use While Profiling
    • Obtain Profiling Data During Runtime
    • max_interleaving
      Attribute
    • Partitioning Buffers Across Memory Channels of the Same Memory Type
    • Loop Speculation
  • Reorganized and updated all topics under
    Intel® FPGA Dynamic Profiler for DPC++
    . The following is a summary of the changes:
    • Added a new section "Split Execution and Data Post Processing" in
      Invoke the Profiler Runtime Wrapper to Obtain Profiling Data
    • Content completely rewritten in "Temporal Performance Collection" section of the
      Invoke the Profiler Runtime Wrapper to Obtain Profiling Data
      topic.
    • Removed "Types of Information" table in
      Use Intel® VTune™ Profiler
      topic.
    • Removed
      Cache Hit
      topic.
    • Removed "Low Bandwidth Efficiency" section from
      Profiler Analyses of Example DPC++ Design Scenarios
      topic.
  • Updated the
    Memory Attributes
    topic to include the
    force_pow2_depth
    attribute.
  • Added a note about nested unrolling creating large blocks in
    unroll
    pragma
    .
  • Updated the compiler name as
    Intel® oneAPI DPC++/C++ Compiler
    .
  • Made minor update to the
    Loop Fusion
    topic about trip count condition relaxation.
  • Updated the
    Quick Reference
    section with new loop attributes and flags supported in this release.
  • Updated the
    Pipes Extension
    topic to include I/O pipes.
July 2020
2021.1-beta08
Bug fixes.
June 2020
2021.1-beta07
  • Added the following new topics:
    • Modify the handshaking protocol (
      -Xshyper-optimized-handshaking
      )
    • Omit hardware that generates and dispatches kernel IDs
    • Specify number of SIMD work-items
  • Updated the
    Loop Analysis
    topic to include scheduler's behavior.
  • Updated the
    Memory Attributes
    topic to include
    bank_bits
    attribute.
  • Removed all references to constant memory.
  • Removed
    Configure Constant Memory Cache Size (
    -Xsconst-cache-bytes=<N>
    )
    optimization flag.
  • Updated a caution note in
    ivdep attribute
    topic.
  • Updated the
    Specify a Work-Group Size
    topic to include the
    [[intelfpga::max_work_group_size(Z, Y, X)]]
    attribute.
May 2020
2021.1-beta06
  • Added the following new topics:
    • Cluster types such as Stall Enable Cluster (SEC) and Stall-Free Cluster (SFC).
    • Cluster characteristics.
    • Handshaking between clusters
    • Unrolling loops and conditional statements
    • Pipelining loops within a single work item
    • Pipelining loops across multiple work items
    • System-level profiling using the Intercept Layer for OpenCL Applications
    • Setting up the Intercept Layer for OpenCL Applications
    • Applying double-buffering using the Intercept Layer for OpenCL Applications
    • Force a single store ring to reduce area (
      -Xsforce-single-store-ring
      )
    • Force fewer read data reorder units to reduce area (
      -Xsnum-reorder
      )
    • Ignore dependencies between accessor arguments
  • Merged F
    MAX
    II report with Loops Analysis report.
  • Removed the
    -Xsno-accessor-aliasing
    flag.
  • Changed
    -Xsfmax
    flag as
    -Xsclock
    flag.
  • Removed F
    MAX
    II report since it is deprecated.
  • Removed
    Duplicate the Store Ring to improve throughput (
    -Xsduplicate-ring
    )
  • Bug fixes and improvements.
March 2020
2021.1-beta05
Bug fixes and improvements
January 2020
2021.1-beta04
Bug fixes and improvements
October 2019
2021.1-beta03
Initial release

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804