Developer Guide

Contents

Strategies for Inferring the Accumulator

To leverage the single cycle floating-point accumulator feature, you can modify the accumulator description in your kernel code to improve efficiency or work around programming restrictions.

Describe an Accumulator Using Multiple Loops

Consider a case where you want to describe an accumulator using multiple loops, with some of the loops being unrolled:
float acc = 0.0f; for (i = 0; i < k; i++) { #pragma unroll for (j = 0; j < 16; j++) acc += (x[i+j]*y[i+j]); }
With fast math enabled by default, the
Intel® oneAPI
DPC++/C++
Compiler
automatically rearranges operations in a way that exposes the accumulation.

Modify a Multi-Loop Accumulator Description

In cases where you cannot compile an accumulator description using the 
-Xsfp-relaxed
 compiler command option, rewrite the code to expose the accumulation.
For the code example above, rewrite it in the following manner:
float acc = 0.0f; for (i = 0; i < k; i++) { float my_dot = 0.0f; #pragma unroll for (j = 0; j < 16; j++) my_dot += (x[i+j]*y[i+j]); acc += my_dot; }

Modify an Accumulator Description Containing a Variable or Non-Zero Initial Value

Consider a situation where you might want to apply an offset to a description of an accumulator that begins with a non-zero value:
float acc = array[0]; for (i = 0; i < k; i++) { acc += x[i]; }
Because the accumulator hardware does not support variable or non-zero initial values in a description, you must rewrite the description.
float acc = 0.0f; for (i = 0; i < k; i++) { acc += x[i]; } acc += array[0];
Rewriting the description in the above manner enables the kernel to use an accumulator in a loop. The loop structure is then followed by an increment of 
array[0]
.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.