• 04/03/2020
  • Public Content
Contents

Using Dedicated Thread Scratch Memory with Advanced Tiling Kernels

The kernel function
vx_advanced_tiling_kernel_intel_f
may be called by multiple runtime threads simultaneously to process many tiles in parallel. A kernel may require some scratch memory to be available, dedicated to each worker thread, to aid in processing, or to be used as thread-specific storage.
If the desired size of dedicated thread scratch memory is known before
vxFinalizeKernel
is called, the following kernel attribute can be set as the dedicated thread scratch memory size in bytes:
VX_KERNEL_TILE_MEMORY_SIZE_INTEL
If the desired size of dedicated thread scratch memory is not a constant for all instances of a given kernel, and instead is a function of a parameter or an attribute that is not known until
vxVerifyGraph
, the following node attribute can be set as the dedicated thread scratch memory size in bytes:
VX_NODE_TILE_MEMORY_SIZE_INTEL
If set, the runtime allocates a cache-aligned buffer per each runtime thread, which may call
vx_advanced_tiling_kernel_f
during
vxProcessGraph
. For each call to the advanced tiling kernel function, the runtime will pass in
tile_memory
, the starting pointer of the thread's dedicated scratch memory buffer, along with
tile_memory_size
, the allocated size of the buffer.
typedef vx_status (*vx_advanced_tiling_kernel_intel_f)(vx_node node, void * parameters[], vx_uint32 num, void * tile_memory, vx_size tile_memory_size);
If neither of the “tile memory size” attributes described above are set,
vx_advanced_tiling_kernel_intel_f
is called with a
tile_memory
pointer equal to NULL, and
tile_memory_size
equal to
0
.
In addition to passing the dedicated thread scratch memory to
vx_advanced_tiling_kernel_intel_f
, the buffers are also passed to the optional callback functions
vx_kernel_preprocess_intel_f
and
vx_kernel_postprocess_intel_f
:
typedef vx_status (*vx_kernel_preprocess_intel_f)(vx_node node, const vx_reference *parameters, vx_uint32 num, void * tile_memory[], vx_uint32 num_tile_memory_elements,vx_size tile_memory_size);
typedef vx_status (*vx_kernel_postprocess_intel_f)(vx_node node, const vx_reference *parameters, vx_uint32 num, void * tile_memory[], vx_uint32 num_tile_memory_elements,vx_size tile_memory_size);
As these functions are only called once per
vxProcessGraph
, the pointers to each of the thread specific buffers are passed in as parameters.
tile_memory
is an array of pointers. Each element of this array corresponds to a dedicated scratch memory buffer for a particular thread.
num_tile_memory_elements
specifies the number of elements within the
tile_memory
array.
tile_memory_size
specifies the allocated size for the buffers.
Access to the thread specific scratch buffers during “Pre-Process” and “Post-Process” can be useful in many ways. One specific example is a custom histogram kernel. This histogram kernel may choose to accumulate histogram data into thread-specific histograms, as this is much more optimal than synchronizing access to a single histogram during
vxProcessGraph
. Within
vx_kernel_preprocess_intel_f
, the kernel would need to ensure that the thread specific histograms are initialized with zeros. Within
vx_kernel_postprocess_intel_f
, the kernel would generate the final histogram by accumulating the results of each entry for every thread-specific histogram.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804