Developer Guide

Contents

Strategies for Inferring the Accumulator

To leverage the single cycle floating-point accumulator feature, you can modify the accumulator description in your kernel code to improve efficiency or work around programming restrictions.

Describe an Accumulator Using Multiple Loops

Consider a case where you want to describe an accumulator using multiple loops, with some of the loops being unrolled:
float acc = 0.0f; for (i = 0; i < k; i++) { #pragma unroll for (j = 0; j < 16; j++) acc += (x[i+j]*y[i+j]); }
In this situation, it is important to compile the kernel with the 
-Xsfp-relaxed
 compiler command option to enable the
Intel® oneAPI DPC++/C++ Compiler
to rearrange operations in a way that exposes the accumulation. If you do not compile the kernel with 
-Xsfp-relaxed
, the resulting accumulator structure has a high initiation interval (II). II is the number of cycles between launching successive loop iterations. The higher the II value, the longer the accumulator structure must wait before it can process the next loop iteration.

Modify a Multi-Loop Accumulator Description

In cases where you cannot compile an accumulator description using the 
-Xsfp-relaxed
 compiler command option, rewrite the code to expose the accumulation.
For the code example above, rewrite it in the following manner:
float acc = 0.0f; for (i = 0; i < k; i++) { float my_dot = 0.0f; #pragma unroll for (j = 0; j < 16; j++) my_dot += (x[i+j]*y[i+j]); acc += my_dot; }

Modify an Accumulator Description Containing a Variable or Non-Zero Initial Value

Consider a situation where you might want to apply an offset to a description of an accumulator that begins with a non-zero value:
float acc = array[0]; for (i = 0; i < k; i++) { acc += x[i]; }
Because the accumulator hardware does not support variable or non-zero initial values in a description, you must rewrite the description.
float acc = 0.0f; for (i = 0; i < k; i++) { acc += x[i]; } acc += array[0];
Rewriting the description in the above manner enables the kernel to use an accumulator in a loop. The loop structure is then followed by an increment of 
array[0]
.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserverd for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804