max_concurrency Attribute
max_concurrency
AttributeUse the
max_concurrency
attribute to limit the concurrency of a loop in your kernel. The concurrency of a loop is how many iterations of that loop can be in progress at one time. By default, the Intel® oneAPI
tries to maximize the concurrency of loops so that your kernel runs at peak throughput.
DPC++/C++
CompilerSyntax
[[intel::max_concurrency(n)]]
The
max_concurrency
attribute applies to pipelined loops in single task kernels. Refer to
Pipelining for information about loop pipelining.
The
max_concurrency
attribute enables you to control the on-chip memory resources required to pipeline your loop. To achieve simultaneous execution of loop iterations, the
Intel® oneAPI
must create copies of any memory that is private to a single iteration. These copies are called private copies. The greater the permitted concurrency, the more private copies the compiler must create.
DPC++/C++
CompilerThe attribute parameter
n
is required and must be a non-negative constant expression of integer type. The parameter directs the compiler to restrict the loop’s concurrency to n simultaneous iterations.
The kernel’s
report.html
(Review the report.html File) provides the following information pertaining to loop concurrency:
- Maximum concurrency that theThis information is available in the Loops Analysis report and Kernel Memory Viewer.Intel® oneAPIhas chosen:DPC++/C++Compiler
- In the Loops Analysis report, a message in theDetailspane reports as the maximum number of simultaneous executions has been limited ton.The value of unsignedNcan be greater than or equal to zero. A value ofN = 0indicates unlimited concurrency.
- In the Memory Viewer, the bank view of your local memory graphically shows the number of private copies.
- Impact to memory usage: This information is available in the Area Analysis of System report. A message in the Details pane reports that theIntel® oneAPIhas createdDPC++/C++CompilerNindependent copies of the memory to enable simultaneous execution ofNloop iterations.If you want to exchange some performance for physical memory savings, apply[[intel::max_concurrency(n)]]to the loop, as shown in the following code snippet:[[intel::max_concurrency(1)]] for (int i = 0; i < N; i++) { int arr[M]; // Doing work on arr }When you apply this attribute, theIntel® oneAPIlimits the number of simultaneously-executed loop iterations toDPC++/C++Compilern. The number of private copies of loop-scooped memories is also restricted ton.You can also control the number of private copies (created for a local memory and accessed within a loop) by using[[intel::private_copies(N)]]. If a local memory with[[intel::private_copies(N)]]is accessed with a loop that has[[intel::max_concurrency(M)]]attribute, theIntel® oneAPIlimits the number of simultaneously-executed loop iterations toDPC++/C++Compilermin(M,N). For more information about[[intel::private_copies(N)]], refer to FPGA Memory Attributes.
For additional information, refer to the FPGA tutorial sample “Loop Max Concurrency” listed in the
Intel® oneAPI Samples Browser on Linux* or
Intel® oneAPI Samples Browser on Windows*, or access the code sample in
GitHub.