Developer Guide

Contents

unroll
Pragma

Loop unrolling involves replicating a loop body multiple times and reducing the trip count of a loop. Unroll loops to reduce or eliminate loop control overhead on the FPGA. In cases where there are no loop-carried dependencies and the
Intel® oneAPI DPC++/C++ Compiler
can perform loop iterations in parallel, unrolling loops can also reduce latency and overhead.
Unrolling of nested loops with large bounds might generate huge number of instructions that could lead to very long compile times.
The compiler might unroll simple loops even if a pragma does not annotate them. To direct the compiler to unroll a loop, or to explicitly not unroll a loop, insert an
unroll
kernel pragma in the kernel code preceding a loop you want to unroll. To specify an unroll factor
N
, use the optional unroll factor specifier
#pragma unroll <N>
. For more information, see
Determining the Correct Unroll Factor
section in Unrolling Loops FPGA tutorial.
Syntax
#pragma unroll #pragma unroll N
If you specify the unroll factor
N
, the factor must be a positive constant expression of integer type. If you omit the unroll factor
N
, the loop is unrolled fully.
Examples
The following is an example of full loop unrolling:
// Before unrolling loop #pragma unroll for(i = 0 ; i < 5; i++){ a[i] += 1; }
// After fully unrolling the loop by a factor of 5, // the loop is flattened. There is no loop after unrolling. a[0] += 1; a[1] += 1; a[2] += 1; a[3] += 1; a[4] += 1;
You can observe that a full unroll is a special case where the unroll factor is equal to the number of loop iterations.
The following is an example of partial loop unrolling:
// Before unrolling loop #pragma unroll 4 for(i = 0 ; i < 20; i++){ a[i] += 1; }
// After the loop is unrolled by a factor of 4, // the loop has five (20 / 4) iterations. for(i = 0 ; i < 5; i++){ a[i * 4] += 1; a[i * 4 + 1] += 1; a[i * 4 + 2] += 1; a[i * 4 + 3] += 1; }
In the partial unroll example, each loop iteration in the unrolled loop is equivalent to four iterations. The
Intel® oneAPI DPC++/C++ Compiler
instantiates four adders instead of one adder. Because there is no data dependency between iterations in the loop (which is true in this case), the compiler executes four adds in parallel.
For additional information, refer to the FPGA tutorial sample "Loop Unroll" listed in the Intel® oneAPI Samples Browser on Linux* or Intel® oneAPI Samples Browser on Windows*.
Notes
  • Provide an
    unroll
    factor whenever possible. To specify an unroll factor
    N
    , insert the
    #pragma unroll <N>
    directive before a loop in your kernel code. The
    Intel® oneAPI DPC++/C++ Compiler
    attempts to unroll the loop at most
    <N>
    times. Consider the following code fragment. By assigning a value of 2 as the unroll factor, you direct the compiler to unroll the loop twice.
    #pragma unroll 2 for(size_t k = 0; k < 4; k++) { mac += data_in[(gid * 4) + k] * coeff[k]; }
    For more information, see
    Determining the Correct Unroll Factor
    in Unrolling Loops FPGA tutorial.
  • To unroll a loop fully, you may omit the unroll factor by simply inserting the
    #pragma unroll
    directive before a loop in your kernel code. The compiler attempts to unroll the loop fully if it understands the trip count and issues a warning if it cannot execute the unroll request.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804