API Quick Reference Guide

  • 2020
  • 09/30/2019
  • Public Content
Contents

Tiling and Threading

The API of Integration Wrappers (IW) is designed to simplify tile-based processing of images. Tiling is based on the concept of region of interest (ROI).
Most IW image processing functions operate not only on whole images but also on image areas - ROIs. Image ROI is a rectangular area that is either some part of the image or the whole image.
ROI of an image is defined by the size and offset from the image origin, as shown in the figure below. The origin of an image is in the top left corner, with
x
values increasing from left to right and
y
values increasing downwards.

Borders Overlapping

Image filters use the borders concept to correctly process image pixels around the current pixel. A filter kernel can be applied to pixels that are outside of image boundaries, and the function must either extrapolate pixels using one of the border extrapolation methods (replicate, mirror, etc.) or use pixels from memory if the image border physically exists in memory.
Borders can complicate tiling, because for each tile you need to apply proper border
InMem
flags according to the current tile position relative to the image. If the filter border size is greater than 1 pixel, for some tile positions filter and image borders can overlap, which means that the filter border can be inside and outside of the image at the same time. Intel IPP functions do not support input with undefined borders, in such cases filtering may result in distorted pixels around the image borders.
Overlapping may happen only if the filter border size is more than 1 pixel and the following conditions are true:
  • For left and top borders:
    tile_size
    <
    border_size
  • For right and bottom borders: (
    image_size
    %
    tile_size
    > 0) && (
    image_size
    %
    tile_size
    <
    border_size
    )
You can ignore overlapped borders if you do not need the bit-exact quality of tiling around image boundaries. But to provide the same result as without tiling, you must tune the tile size manually to avoid overlapping or use special Integration Wrappers APIs, which can handle this problem for you. For more details, see the sections below.
The sections below explain the following IW tiling techniques:

Manual tiling

IW functions are designed to be tiled using the
IwiTile
and
IwsTile
interfaces for image and signal functions, respectively. But if for some reasons automatic tiling with
IwiTile
is not suitable, there are special APIs to perform tiling manually.
When using manual tiling you need to:
  • Shift images to a correct position for a tile using
    iwiImage_GetRoiImage
  • If necessary, pass correct border
    InMem
    flags to a function using
    iwiTile_GetTileBorder
  • If necessary, check the filter border around the image border using
    iwiTile_CorrectBordersOverlap
Here is an example of IW threading with OpenMP* using manual tiling:
#include <iostream> #include "iw++/iw.hpp" #ifdef _OPENMP #include <omp.h> #endif int main(int, char**) { int fail = 0; // Create images ipp::IwiImage srcImage, cvtImage, dstImage; srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3); cvtImage.Alloc(srcImage.m_size, ipp8u, 1); dstImage.Alloc(srcImage.m_size, ipp16s, 1); #ifdef _OPENMP int threads = omp_get_max_threads(); // Get threads number #else int threads = 4; // Just divide to porcess by tiles #endif ipp::IwiSize tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread ipp::IwiBorderSize sobBorderSize = iwiSizeToBorderSize(iwiMaskToSize(ippMskSize3x3)); // Convert mask size to border size ipp::IwiBorderType border = ippBorderRepl; #ifdef _OPENMP #pragma omp parallel num_threads(threads) #endif { // Declare thread-scope variables ipp::IwiBorderType threadBorder; ipp::IwiImage srcTile, cvtTile, dstTile; try { // Color convert threading #ifdef _OPENMP #pragma omp for #endif for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height) { ipp::IwiRoi tile(0, row, tileSize.width, tileSize.height); // Create actual tile rectangle // Get images for current ROI srcTile = srcImage.GetRoiImage(tile); cvtTile = cvtImage.GetRoiImage(tile); // Run functions ipp::iwiColorConvert(srcTile, iwiColorRGB, cvtTile, iwiColorGray); } // Sobel threading #ifdef _OPENMP #pragma omp for #endif for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height) { ipp::IwiRoi tile(0, row, tileSize.width, tileSize.height); // Create actual tile rectangle ipp::IwiTile::CorrectBordersOverlap(tile, border, sobBorderSize, cvtImage.m_size); // Check borders overlap and correct tile of necessary threadBorder = ipp::IwiTile::GetTileBorder(tile, border, sobBorderSize, cvtImage.m_size); // Get actual tile border // Get images for current ROI cvtTile = cvtImage.GetRoiImage(tile); dstTile = dstImage.GetRoiImage(tile); // Run functions ipp::iwiFilterSobel(cvtTile, dstTile, iwiDerivHorFirst, ippMskSize3x3, ipp::IwDefault(), threadBorder); } } catch(...) { fail = 1; } } if(fail) { std::cout << "Failure!\n"; return 1; } std::cout << "Success!\n"; return 0; }
Several simplified IW versions of complex Intel IPP functions cannot be tiled manually because of interface limitations. If such limitation exists, it is specified in a function entry in a header file and in the reference section of this document.

IwiTile
-based tiling

IwiTile
is a main interface structure for tiling in IW. This interface has two associated APIs:
  • Basic tiling
    API with the
    iwiTile_
    prefix
  • Pipeline tiling
    API with the
    iwiTilePipeline_
    prefix
Most IW image processing functions have the
IwiTile
parameter. For example, see the API of the
iwiFilterSobel
function:
iwiFilterSobel( const IwiImage *pSrcImage, IwiImage *pDstImage, IwiDerivativeType opType, IppiMaskSize kernelSize, const IwiFilterSobelParams *pAuxParams, IwiBorderType border, const Ipp64f *pBorderVal,
const IwiTile *pTile
);
  • pSrcImage
    and
    pDstImage
    are initialized with the size of the whole source and destination images accordingly
  • pTile
    is a pointer to the
    IwiTile
    structure. You do not need to shift input/output buffers and check borders manually. The
    IwiTile
    initialization function and processing function will place input and output buffers automatically. If you do not need to use tiling, pass
    NULL
    to
    pTile
    , and the whole image will be processed at once.
If a function does not have the
IwiTile
parameter, it means that the function cannot be tiled because of algorithmic limitations. You can use manual tiling for such functions, but it may produce incorrect results.
Basic tiling
You can use basic tiling to tile or thread one standalone function or a group of functions without borders. To apply basic tiling, initialize the
IwiTile
structure with the current tile ROI and pass it to the processing function.
For functions operating with different sizes for source and destination images, use the destination size as a base for tile parameters.
Here is an example of IW threading with OpenMP* using basic tiling with
IwiTile
:
#include <iostream> #include "iw++/iw.hpp" #ifdef _OPENMP #include <omp.h> #endif int main(int, char**) { int fail = 0; // Create images ipp::IwiImage srcImage, cvtImage, dstImage; srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3); cvtImage.Alloc(srcImage.m_size, ipp8u, 1); dstImage.Alloc(srcImage.m_size, ipp16s, 1); #ifdef _OPENMP int threads = omp_get_max_threads(); // Get threads number #else int threads = 4; // Just divide to porcess by tiles #endif ipp::IwiSize tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread ipp::IwiBorderType border = ippBorderRepl; #ifdef _OPENMP #pragma omp parallel num_threads(threads) #endif { // Declare thread-scope variables ipp::IwiRoi roi; try { // Color convert threading #ifdef _OPENMP #pragma omp for #endif for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height) { // Run functions with the current tile rectangle ipp::iwiColorConvert(srcImage, iwiColorRGB, cvtImage, iwiColorGray, IwValueMax, ipp::IwDefault(), ipp::IwiRoi(0, row, tileSize.width, tileSize.height)); } // Sobel threading #ifdef _OPENMP #pragma omp for #endif for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height) { // Run functions with the current tile rectangle ipp::iwiFilterSobel(cvtImage, dstImage, iwiDerivHorFirst, ippMskSize3x3, ipp::IwDefault(), border, ipp::IwiRoi(0, row, tileSize.width, tileSize.height)); } } catch(...) { fail = 1; } } if(fail) { std::cout << "Failure!\n"; return 1; } std::cout << "Success!\n"; return 0; }
Pipeline tiling
With the
IwiTile
interface you can easily tile pipelines by applying a current tile to an entire pipeline at once instead of tiling each function one by one. This operation requires borders handling and tracking pipeline dependencies, which increases complexity of the API. But when used properly, pipeline tiling can increase scalability of threading or performance of non-threaded functions by performing all operations inside the CPU cache.
Here are some important details that you should take into account when performing pipeline tiling:
  1. Pipeline tiling is performed in reverse order: from destination to source, therefore:
    • Use the tile size based on the destination image size
    • Initialize the
      IwiTile
      structure with the
      IwiTilePipeline_Init
      for the last operation
    • Initialize the
      IwiTile
      structure for other operations from the last to the first with
      IwiTilePipeline_InitChild
  2. Obtain the border size for each operation from its mask size, kernel size, or using the specific function returning the border size, if any.
  3. In case of threading, copy initialized
    IwiTile
    structures to a local thread or initialize them on a per-thread basis. Access to structures is not thread-safe.
  4. Do not exceed the maximum tile size specified during initialization. Otherwise, this can lead to buffers overflow.
The following example demonstrates IW threading with OpenMP* using
IwiTile
pipeline tiling.
#include <iostream> #include "iw++/iw.hpp" #ifdef _OPENMP #include <omp.h> #endif int main(int, char**) { int fail = 0; // Create images ipp::IwiImage srcImage, dstImage; srcImage.Alloc(ipp::IwiSize(320, 240), ipp8u, 3); dstImage.Alloc(srcImage.m_size, ipp16s, 1); #ifdef _OPENMP int threads = omp_get_max_threads(); // Get threads number #else int threads = 4; // Just divide to porcess by tiles #endif ipp::IwiSize tileSize(dstImage.m_size.width, (dstImage.m_size.height + threads - 1)/threads); // One tile per thread ipp::IwiBorderSize sobBorderSize = iwiSizeToBorderSize(iwiMaskToSize(ippMskSize3x3)); // Convert mask size to border size ipp::IwiBorderType border = ippBorderRepl; #ifdef _OPENMP #pragma omp parallel num_threads(threads) #endif { // Declare thread-scope variables ipp::IwiImage cvtImage; ipp::IwiTilePipeline roiConvert, roiSobel; try { roiSobel.Init(tileSize, dstImage.m_size, border, sobBorderSize); // Initialize last operation ROI first roiConvert.InitChild(roiSobel); // Initialize next operation as a dependent // Allocate intermediate buffer cvtImage.Alloc(roiConvert.GetDstBufferSize(), ipp8u, 1); // Joined pipeline threading #ifdef _OPENMP #pragma omp for #endif for(ipp::IwSize row = 0; row < dstImage.m_size.height; row += tileSize.height) { roiSobel.SetTile(ipp::IwiRoi(0, row, tileSize.width, tileSize.height)); // Set IwiRoi chain to current tile coordinates // Run functions ipp::iwiColorConvert(srcImage, iwiColorRGB, cvtImage, iwiColorGray, IwValueMax, ipp::IwDefault(), roiConvert); ipp::iwiFilterSobel(cvtImage, dstImage, iwiDerivHorFirst, ippMskSize3x3, ipp::IwDefault(), border, roiSobel); } } catch(...) { fail = 1; } } if(fail) { std::cout << "Failure!\n"; return 1; } std::cout << "Success!\n"; return 0; }
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804