• 04/03/2020
  • Public Content
Contents

OpenVX* Performance Tips

A rule of thumb for OpenVX* is applicability of general “compute” optimizations, as from the optimization point of view, the OpenVX is just another API for compute. Thus, common tips like providing the sufficient parallel slack, avoiding redundant copies or extensive synchronization, and so on, hold true. These are especially pronounced for heterogeneous scenarios, as latencies of communicating to accelerators are involved.
There are specific actions to optimize the performance of an OpenVX application. These actions are as follows:  
  • Use virtual images whenever possible, as this unlocks many graph compiler optimizations.
  • Whenever possible, prefer standard nodes and/or extensions over user kernel nodes (which serve as memory and execution barriers, hindering performance). This gives the Pipeline Manager much more flexibility to optimize the graph execution.
  • If you still need to implement a user node, base it on the Advanced Tiling Extensions (see the Intel's Extensions to the OpenVX* API: Advanced Tiling chapter)
  • If the application has independent graphs, run these graphs in parallel using
    vxScheduleGraph
    API call.
  • Provide enough parallel slack to the scheduler- do not break work (for example, images) into too many tiny pieces. Consider kernel fusion.
  • For images, use smallest data type that fits the application accuracy needs (for example, 32->16->8 bits).
  • Consider heterogeneous execution (see the Heterogeneous Computing with OpenVINO™ toolkit chapter).
  • You can create an OpenVX image object that references a memory that was externally allocated (
    vxCreateImageFromHandle
    ). To enable zero-copy with the GPU the externally allocated memory should be aligned.  For more details, refer to https://software.intel.com/en-us/node/540453.
  • Beware of the (often prohibitive)
    vxVerifyGraph
    latency costs. For example, construct the graph in a way it would not require the verification upon the parameters updates. Notice that unlike Map/Unmap for the input images (see the Map/Unmap for OpenVX* Images section), setting new images with different meta-data (size, type, etc) almost certainly triggers the verification, potentially adding significant overhead.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804