Developer Guide

Contents

max_concurrency
Attribute

Use the 
max_concurrency
 attribute to limit the concurrency of a loop in your kernel. The concurrency of a loop is how many iterations of that loop can be in progress at one time. By default, the 
Intel® oneAPI
DPC++/C++
Compiler
 tries to maximize the concurrency of loops so that your kernel runs at peak throughput.
Syntax
[[intel::max_concurrency(n)]]
The 
max_concurrency
 attribute applies to pipelined loops in single task kernels. Refer to Pipelining for information about loop pipelining.
The 
max_concurrency
 attribute enables you to control the on-chip memory resources required to pipeline your loop. To achieve simultaneous execution of loop iterations, the
Intel® oneAPI
DPC++/C++
Compiler
must create copies of any memory that is private to a single iteration. These copies are called private copies. The greater the permitted concurrency, the more private copies the compiler must create.
The attribute parameter
n
is required and must be a non-negative constant expression of integer type. The parameter directs the compiler to restrict the loop’s concurrency to n simultaneous iterations.
The kernel’s
report.html
(Review the report.html File) provides the following information pertaining to loop concurrency:
  • Maximum concurrency that the
    Intel® oneAPI
    DPC++/C++
    Compiler
    has chosen:
    This information is available in the Loops Analysis report and Kernel Memory Viewer.
    • In the Loops Analysis report, a message in the
      Details
      pane reports as the maximum number of simultaneous executions has been limited to
      n
      .
      The value of unsigned 
      N
       can be greater than or equal to zero. A value of 
      N = 0
       indicates unlimited concurrency.
    • In the Memory Viewer, the bank view of your local memory graphically shows the number of private copies.
  • Impact to memory usage
    : This information is available in the Area Analysis of System report. A message in the Details pane reports that the
    Intel® oneAPI
    DPC++/C++
    Compiler
    has created
    N
    independent copies of the memory to enable simultaneous execution of
    N
    loop iterations.
    If you want to exchange some performance for physical memory savings, apply
    [[intel::max_concurrency(n)]]
    to the loop, as shown in the following code snippet:
    [[intel::max_concurrency(1)]] ​for (int i = 0; i < N; i++) { int arr[M]; // Doing work on arr }
    When you apply this attribute, the
    Intel® oneAPI
    DPC++/C++
    Compiler
    limits the number of simultaneously-executed loop iterations to
    n
    . The number of private copies of loop-scooped memories is also restricted to
    n
    .
    You can also control the number of private copies (created for a local memory and accessed within a loop) by using 
    [[intel::private_copies(N)]]
    . If a local memory with 
    [[intel::private_copies(N)]]
    is accessed with a loop that has
    [[intel::max_concurrency(M)]]
    attribute, the
    Intel® oneAPI
    DPC++/C++
    Compiler
    limits the number of simultaneously-executed loop iterations to 
    min(M,N)
    . For more information about
    [[intel::private_copies(N)]]
    , refer to FPGA Memory Attributes.
For additional information, refer to the FPGA tutorial sample “Loop Max Concurrency” listed in the Intel® oneAPI Samples Browser on Linux* or Intel® oneAPI Samples Browser on Windows*, or access the code sample in GitHub.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.