Developer Guide

Contents

Loop Speculation

Loop speculation is an optimization technique that enables more efficient loop pipelining by allowing future iterations to be initiated before determining whether the loop was exited already. Consider the following simple loop example:
while (m*m*m < N) { m += 1; }
Logically, the exit condition (
m*m*m < N
) for an iteration must be evaluated before determining whether you need to initiate another iteration or not. This means that, in the absence of speculation, the loop II cannot be lower than the number of cycles it takes to compute this exit condition. Speculated iterations are iterations that launch before the exit condition computation has completed. However, all operations with side-effects, such as stores to memory, are predicated by the exit condition. This means that operations with side-effects still waits for the exit condition to be computed. Loop speculation is beneficial when the exit condition is the bottleneck preventing from achieving a lower II. In the loop shown above, the exit condition contains two multiplications that cannot complete within a single clock cycle. However, loop speculation allows this loop to achieve II=1.
For example, for a given iteration
i
with exit condition
Ei
, the number of speculated iterations
s
is the number of iterations after
i
has been initiated but before
Ei
has been evaluated. By default, this number of speculated iterations is determined by the compiler on a per-loop basis, and can be found in the per-loop details of the Loop Analysis report.
The
speculated_iterations
attribute allows you to directly control the number of speculated iterations for a loop. If the exit condition calculation is the bottleneck to lowering II (as shown in the Loop Analysis report), increasing the number of speculated iterations may improve the II (this is not guaranteed as other bottlenecks may be uncovered). For more information, refer to speculated_iterations Attribute.
Speculated iterations introduce some overhead in nested loops since a new invocation of a loop may not begin until all speculated iterations of its previous invocation have completed. In cases where a loop body with low latency is expected to be frequently invoked, (for example, an inner loop with a short trip count), use the
speculated_iterations
attribute to reduce the number of speculated iterations. You can estimate the amount of this overhead by multiplying the number of speculated iterations with II of the loop (as shown in the Loop Analysis report). Using the speculated_iterations attribute can reduce this overhead, but be aware that choosing an attribute value that is too low may increase the II (due to not having enough time to evaluate the exit condition).
Consider the following example:
while (m*m*m < N) {       m += 1;   }   dst[0] = m; [[intelfpga::speculated_iterations(7)]]   while (m*m*m < N) {       m += 1;   }   dst[0] = m; [[intelfpga::speculated_iterations(0)]]   while (m*m*m < N) {       m += 1;   }   dst[0] = m;
In this example, the exit condition that has two multiplies and a compare is the bottleneck preventing II=1. The compiler's choice of four speculated iterations result in II=2 since the exit condition takes seven cycles (each multiply takes three cycles and the compare takes one cycle) and four speculated iterations times two-cycle II gives eight cycles to cover this evaluation. Then, the speculated iterations are increased to seven to cover the seven-cycle exit condition calculation allowing us to achieve II=1. By setting the speculated_iterations attribute to 0, you can verify that the II has increased to 7, which matches the exit condition bottleneck.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804