• 04/03/2020
  • Public Content
Contents

Advanced Tiling Sample Code for Error Diffusion

Error diffusion is a common technique used to convert a multi-level image to bi-level. In this example, we will implement a version of error diffusion that takes as input a single channel, 8 bit/pixel image (range
0
to
255
), and will produce a single channel, 8 bit/pixel image whose pixel values are either
0
or
255
.
The basic idea is to compare each input pixel value to a fixed threshold (
127
in this example). If the pixel value is above the threshold, the corresponding output pixel is set to
255
, otherwise it is set to
0
. An “error” value is calculated in the following way:
Error = InputPixelValue – OutputPixelValue
For example, say that a given input pixel value is
96
. This value is below the threshold of
127
, so the output pixel value is set to
0
. The error in this case is
(96 – 0) = 96
.
The computed error for each pixel is then distributed to neighboring input pixels. For Floyd-Steinberg error diffusion that we implement, the “coefficients” used to distribute error to neighboring pixels are:
 
X
7/16
3/16
5/16
1/16
Using our example above,
(7/16)*96
would be distributed to the neighboring input pixel to the right,
(1/16*96)
would be distributed to the neighboring input pixel down and so on.
The following images are input grayscale image on the left and the result of the error diffusion on the right:
   
The error diffusion technique is a good example for showing the need for some of the optional callback functions that the advanced tiling API supports because:
  • To process a given input pixel, the neighboring input pixels have to complete processing. This means that we will need to set the output tile serialization attribute.
  • We need a dedicated “error buffer” that we will allocate within the “initialize” function. Accordingly, we deallocate this buffer within the “deinitialize” function.
  • We need to initialize our error buffer to 0s before each
    vxProcessGraph
    ; therefore, we have to implement the “preprocess” function.
We skip the input and output validators’ code, as these are similar to what we defined in the previous example. Also, in the case of error diffusion, there is a one-to-one correspondence between input tiles and output tiles; hence, we omit the tile mapping function, which is trivial.
The following code snippet shows the implementation for the “initialize” callback function:
vx_status ErrorDiffusionInitialize(vx_node node, const vx_reference *parameters, vx_uint32 num) { // we are going to allocate a floating // point error buffer, such that there is an error entry // for each pixel. vx_image input = (vx_image)parameters[0]; vx_int32 width, height; vxQueryImage(input, VX_IMAGE_WIDTH, &width, sizeof(width)); vxQueryImage(input, VX_IMAGE_HEIGHT, &height, sizeof(height)); //we pad image with 2 pixels, to prevent memory // access violations on the right and left edges of the image. width += 2; //we add 1 to the height, to prevent memory // access violations on the bottom edge of the image height += 1; vx_float32 *pErrorBuffer = (vx_float32*)malloc(width*height*sizeof(vx_float32)); if(!pErrorBuffer ) { return VX_ERROR_NO_MEMORY; } //free previously set local ptr for this node vx_float32 *p = 0; vxQueryNode(node, VX_NODE_LOCAL_DATA_PTR, &p, sizeof(p)); if( p) free(p); //set the 'local data ptr' for this node to the new errors buffer. return vxSetNodeAttribute(node, VX_NODE_LOCAL_DATA_PTR, &pErrorBuffer, sizeof(pErrorBuffer)); }
In the case of error diffusion, the initialize function (which is called once per user call to
vxVerifyGraph
) allocates our errors buffer. The buffer is used at runtime to propagate error to neighboring pixels. We set a node attribute, the “local data pointer”, with the allocated buffer. Note that it is a good practice to check if a previously allocated buffer already exists, to prevent memory leaks. This data pointer can be retrieved inside the kernel function using
vxQueryNode
.
“Deinitialize” is only called upon node destruction, so two successive user calls to
vxVerifyGraph
with the same graph imply two successive calls to our “initialize” function in which a previously allocated buffer may have already been set.
The following code snippet shows the implementation for the “deinitialize” callback function:
vx_status ErrorDiffusionDeinitialize(vx_node node, const vx_reference* parameters, vx_uint32 num) { vx_float32 *pErrorBuffer = 0; vxQueryNode(node, VX_NODE_LOCAL_DATA_PTR, &pErrorBuffer, sizeof pErrorBuffer)); if( pErrorBuffer ) { free(pErrorBuffer); pErrorBuffer = 0; // set the local data ptr to 0 vxSetNodeAttribute(node, VX_NODE_LOCAL_DATA_PTR, &pErrorBuffer, sizeof(pErrorBuffer)); } return VX_SUCCESS; }
The “deinitialize” function is called once upon node destruction. We must free the error buffer that was allocated within the “initialize” function. Note that it is required to set the node’s “local data ptr” back to
0
to prevent the runtime from attempting to also free the pointer.
The following code snippet shows the “set tile dimensions” callback function:
vx_status ErrorDiffusionSetTileDimensions(vx_node node, const vx_reference *parameters, vx_uint32 num, const vx_tile_block_size_intel_t *current_tile_dimensions, vx_tile_block_size_intel_t *updated_tile_dimensions) { vx_image input = (vx_image)parameters[0]; vx_int32 width; vxQueryImage(input, VX_IMAGE_WIDTH, &width, sizeof(width)); //Set the desired tile width to the entire input image width updated_tile_dimensions->width = width; //Keep the height as the current tile height updated_tile_dimensions->height = current_tile_dimensions->height; return VX_SUCCESS; }
The following code snippet shows the implementation for the “preprocess” callback function:
vx_status ErrorDiffusionPreProcess(vx_node node, const vx_reference *parameters, vx_uint32 num, void * tile_memory[], vx_uint32 num_tile_memory_elements,vx_size tile_memory_size) { vx_image input = (vx_image)parameters[0]; vx_int32 width, height; vxQueryImage(input, VX_IMAGE_WIDTH, &width, sizeof(width)); vxQueryImage(input, VX_IMAGE_HEIGHT, &height, sizeof(height)); vx_float32 *pErrorBuffer = 0; size_t sz = sizeof(pErrorBuffer); vxQueryNode(node, VX_NODE_LOCAL_DATA_PTR, &pErrorBuffer, sz); if( !pErrorBuffer ) return VX_ERROR_NOT_ALLOCATED; // patch our width & height (following the logic of the Initialize function) width += 2; height += 1; //initialize our error buffer to all 0's memset(pErrorBuffer, 0, width*height*sz); return VX_SUCCESS; }
The “preprocess” callback function is called at the beginning of every user call to
vxProcessGraph
before any nodes have started processing. We use this function to re-initialize our error buffer.
For brevity we the code for the kernel for the error diffusion itself is skipped:
static vx_status ErrorDiffusionAdvancedTilingKernel ( vx_node node,void * parameters[], vx_uint32 num,void * tile_memory, vx_size tile_memory_size) { vx_tile_t *pInTileIn = (vx_tile_t *)parameters[0]; vx_tile_t *pInTileOut = (vx_tile_t *)parameters[1]; ... return VX_SUCCESS; }
Finally, to create the advanced tiling kernel, the
vxAddAdvancedTilingKernelIntel
interface should be invoked as in the following code snippet:
//create a kernel via the vxAddAdvancedTilingKernelIntel interface vx_kernel kernel = vxAddAdvancedTilingKernelIntel(context, (vx_char *)"ErrorDiffusion", //name ERRORDIFFUSION_ID, //enumeration ErrorDiffusionAdvancedTilingKernel, //kernel_func_ptr ErrorDiffusionTileMapping, //mapping_func_ptr 2, //num_params ErrorDiffusionInputValidator, //input validate ErrorDiffusionOutputValidator, //output validate ErrorDiffusionInitialize, //initialize ErrorDiffusionDeinitialize, //deinitialize ErrorDiffusionPreProcess, //preprocess NULL, //postprocess ErrorDiffusionSetTileDimensions, //settiledimensions NULL //tiledimensionsinit); //specifiy the parameters for this kernel vxAddParameterToKernel(kernel, 0, VX_INPUT, VX_TYPE_IMAGE, VX_PARAMETER_STATE_REQUIRED); vxAddParameterToKernel(kernel, 1, VX_OUTPUT, VX_TYPE_IMAGE, VX_PARAMETER_STATE_REQUIRED); //set the serial attribute, to produce output tiles serially vx_intel_serial_type_intel_e serial_type = VX_SERIAL_LEFTTOP_TO_RIGHTBOTTOM_INTEL; vxSetKernelAttribute(kernel, VX_KERNEL_SERIAL_TYPE_INTEL, &serial_type, sizeof(serial_type)); //done with publishing vxFinalizeKernel(kernel);
 

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804