Developer Guide

Contents

Executing Independent Operations Simultaneously

As described in Mapping Source Code Instructions to Hardware, the compiler can automatically identify independent operations and execute them simultaneously in hardware. This, when combined with pipelining (explained below), is how performance through data parallelism is achieved on the FPGA.
The following image illustrates an example of an adder and a multiplier, which are scheduled to execute simultaneously while operating on separate inputs:
Automatic Vectorization in the Generated Hardware Datapath
Automatic Vectorization in the Generated Hardware Datapath
This automatic vectorization is analogous to how a superscalar processor takes advantage of instruction-level parallelism, but this vectorization happens statically at compile time instead of dynamically, at runtime. This means that there is no hardware or runtime cost of dependency checking for the generated hardware datapath. Additionally, the flexible logic and routing of an FPGA means that only the available resources (ALMs, DSPs, and so on) of the FPGA bound the number of independent operations operating simultaneously.

Unrolling Loops

You can unroll loops in the design by using loop attributes. Loop unrolling decreases the number of iterations executed at the expense of increasing hardware resource consumption corresponding to executing multiple iterations of the loop simultaneously. Once unrolled, the hardware resources are scheduled as described in Scheduling.
The
Intel® oneAPI DPC++/C++ Compiler
never attempts to unroll any loops in your source code automatically. You must always control loop unrolling by using the corresponding pragma. For more information, refer to Unroll Loops, Loop Directives, and Unrolling Loops tutorial.

Conditional Statements

The
Intel® oneAPI DPC++/C++ Compiler
attempts to eliminate conditional or branch statements as much as possible. Conditionally executed code becomes predicated in the hardware. That is, branched instructions are replaced with conditionally-executed instructions. Predication increases the possibilities for executing operations simultaneously and achieving better performance. Additionally, removing branches allows the compiler to apply other optimizations to the design.
Conditional Statements
Conditional Statement

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804