Inferring a Shift Register
The shift register design pattern is a very important design pattern for efficient implementation of many applications on the FPGA. However, the implementation of a shift register design pattern might seem counter-intuitive at first.
Consider the following code example:
using InPipe = INTEL::pipe<class PipeIn, int, 4>;
using OutPipe = INTEL::pipe<class PipeOut, int, 4>;
#define SIZE 512
//Shift register size must be statically determinable
// this function is used in kernel
//The key is that the array size is a compile time constant
// Initialization loop
for (int i = 0; i < SIZE; i++)
//All elements of the array should be initialized to the same value
shift_reg[i] = 0;
// Fully unrolling the shifting loop produces constant accesses
for (int j = 0; j < SIZE–1; j++)
shift_reg[j] = shift_reg[j + 1];
shift_reg[SIZE – 1] = InPipe::read();
// Using fixed access points of the shift register
int res = (shift_reg + shift_reg) / 2;
// ‘out’ pipe will have running average of the input pipe
In each clock cycle, the kernel shifts a new value into the array. By placing this shift register into a block RAM, the
can efficiently handle multiple access points into the array. The shift register design pattern is ideal for implementing filters (for example, image filters like a Sobel filter or time-delay filters like a finite impulse response (FIR) filter).
When implementing a shift register in your kernel code, remember the following key points:
Unroll the shifting loop so that it can access every element of the array.
All access points must have constant data accesses. For example, if you write a calculation in nested loops using multiple access points, unroll these loops to establish the constant access points.
Initialize all elements of the array to the same value. Alternatively, you may leave the elements uninitialized if you do not require a specific initial value.
If some accesses to a large array are not inferable statically, they force the
to create inefficient hardware. If these accesses are necessary, use
memory instead of
Do not shift a large shift register conditionally. The shifting must occur in very loop iteration that contains the shifting code to avoid creating inefficient hardware.
Conditionally shifting large shift registers inside pipelined loops leads to the creation of inefficient hardware.
For example, the following kernel consumes more resources when the if (
K > 5
) condition is present:
#define SHIFT_REG_LEN 1024
void bad_shift_reg (accessor<int, access::mode::read, access:: target::global_buffer> src,
accessor<int, access::mode::write, access:: target::global_buffer> dst,
int sum = 0;
for (unsigned i = 0; i < K; i++)
sum += shift_reg;
shift_reg[SHIFT_REG_LEN-1] = src[i];
// This condition will cause sever area bloat.
if (K > 5)
for (int m = 0; m < SHIFT_REG_LEN-1 ; m++)
shift_reg[m] = shift_reg[m + 1];
dst[i] = sum;
If it is necessary to implement conditional shifting of a large shift register in your kernel, consider modifying your code so that it uses local memory.