Developer Guide

Contents

Pipelining Loops in Non-task Kernels (
-Xsauto-pipeline
)

To direct the
Intel® oneAPI
DPC++/C++
Compiler
to compile your design and pipeline loops in non-task (
parallel_for
) kernels, include the
-Xsauto-pipeline
option in your
dpcpp
command. The host program invokes non-task kernels through the kernel execution function
parallel_for
,
parallel_for_work_item
, or
parallel_for_work_group
.
Example
dpcpp -fintelfpga –Xshardware -Xsauto-pipeline <source_file>.cpp
With the
-Xsauto-pipeline
option, the compiler attempts to pipeline the loops in your design, but the pipelining is not guaranteed. If you do not include the
-Xsauto-pipeline
option, the compiler does not pipeline the loops in
parallel_for
kernels. However, it executes different work items in parallel.
The
-Xsauto-pipeline
option might improve or degrade performance depending on the memory access pattern in your design.
  • If the auto-pipelining is successful, the Loop Analysis report displays the message
    Auto-pipelined parallel_for
    and
    parallel_for rewritten as a pipelined single_task
    (Details pane) . The compiler-generated loops appear marked as
    Compiler generated auto-pipeline loop
    in the report.
  • If the compiler chooses not to auto-pipeline the loops, the Loop Analysis report displays a message for the kernel. The reasons for not auto-pipelining a loop can be one of the following:
    • A barrier in the function is not at the top-level function scope.
    • Kernel uses a local or private memory.
    • Kernel uses a volatile or atomic memory, or channels.
If you do not want the compiler to pipeline some infrequently used loops while allowing other loops to be auto-pipelined, use the
[[intel::disable_loop_pipelining]]
loop directive on specific loops when using the
-Xsauto-pipeline
option. This loop directive disables the loop pipelining.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.