Closures and Capturing - Part 2 : C++ code inside ArBB functions

Please refer to the article Closures and Capturing Part I to get a better understanding of how closures and capturing work.

There are important implications for code that mixes operations on Intel® Array Building Blocks (Intel® ArBB) types and regular C++ types, such as float or std::ostream. A common example is when some programmers want to put to put C++ code inside of their Intel® Array Building Block (Intel® ArBB)  functions. One idea is to put in output statements such as cout or printf for “quick and dirty" debugging instead of using the Intel ArBB debugger.

We recommend keeping your C++ code to a minimum inside Intel ArBB functions for the following reasons:

  1. C++ code inside of Intel ArBB functions is not optimized for multiple cores. It is inlined.
  2. C++ code is only executed the first time you use the call() operator to invoke the Intel ArBB function during the capturing process. You would have to do a recapture to execute the C++ statements. So even if you have cout or printf statements inside the Intel ArBB function, you will get no output after the first run.

If you are trying to see the contents of Intel ArBB data types without the use of the debugger (just outputting some results to the screen), you could create an Intel ArBB function but not use the call() mechanism. This would be the equivalent of running in emulation mode.

After executing a function, the closure contains a sequence of instructions exactly corresponding to all the operations on Intel ArBB types performed by the function, NOT the C++ code. The control flow macros such as _if also simply record their presence, executing both bodies of an if-statement, and executing a loop iteration exactly once, but maintaining the control flow structure in the closure. Please see different situations at the bottom of the article here that demonstrate situations where you may want to use standard control follow in place of ArBB control flow. 

Once a closure is captured, Intel ArBB can compile it for a supported architecture and use the compiled object to execute the computations captured from the function repeatedly without any compilation overhead. As this process occurs at run time, the generated code can be optimized for the exact machine configuration in use.

Once a closure is captured, Intel ArBB can compile it for a supported architecture and use the compiled object to execute the computations captured from the function repeatedly without any compilation overhead. However, there are situations where compilation overhead is nonetheless incurred. Currently, closures are dynamically recompiled based on argument attributes. Until then, varying the following attributes of an input or output argument causes additional compilation overhead:

- for containers, whether the argument is bound or not

- for 2D and 3D bound containers, whether the argument is strided or not

Once a closure has been compiled for an ordered set of arguments with particular attributes, the closure will not be recompiled for arguments with the same attributes. To avoid implicit dynamic recompilation as part of closure execution, use a closure's compile() member function. compile() internally prepares a closure to execute with given arguments, or arguments with the same attributes, so that implicit dynamic recompilation does not occur during execution. compile() can be called many times to prepare a closure for more than one combination of attributes. All compilation happens at run time so the generated code can be optimized for the exact machine configuration in use.

Categories:
For more complete information about compiler optimizations, see our Optimization Notice.