Tutorial: OpenCL™ Interoperability with Video Acceleration API on Linux* OS

Published:01/19/2015   Last Updated:01/19/2015

Download PDF[PDF 265KB]

OpenCL™ Code Builder

Contents

Introduction
Key Takeaways
Check VA Sharing Capabilities and Get OpenCL™ Functions
Creating VA Sharing OpenCL Context
Creating VA Sharing OpenCL Image
Using VA Sharing OpenCL Image
References

Introduction

Video Acceleration API (VA API) is an open source cross-platform API. VA API leverages acceleration of video encoding and decoding when hardware support is available. When a particular accelerator is a GPU, video surfaces used by VA API are also located on the GPU side. In turn, OpenCL™ software technology enables general processing of this data on the GPU through OpenCL API. To share the data between VA API and OpenCL API efficiently, Intel provides the cl_intel_va_api_media_sharing OpenCL extension. The goal of this extension is to provide interoperability between OpenCL and VA API, by sharing VA API surfaces with Intel’s implementation of the OpenCL standard without extra copy operations. The extension uses mechanisms and provides surface sharing capabilities for VA API that are similar to the ones covered by the OpenCL 1.2 Media Surface Sharing Khronos* extensions for DirectX* 9/DirectX 10/DirectX 11. To get access to the extension, install the Media Server Studio.

Key Takeaways

The cl_intel_va_api_media_sharing extension enables sharing VA surface data with the OpenCL runtime using images acquired directly from VA surface planes. As a result of such sharing, OpenCL kernels become able to process VA surfaces without extra coping to OpenCL images. If the VA API Media Sharing extension support is available (see next section for details), you need to make the following modifications to leverage the efficient sharing flow in the OpenCL code (all the steps are detailed in the rest of document):

  1. Get pointers to the following extension functions:

    • clGetDeviceIDsFromVA_APIMediaAdapterINTEL
    • clCreateFromVA_APIMediaSurfaceINTEL
    • clEnqueueAcquireVA_APIMediaSurfacesINTEL
    • clEnqueueReleaseVA_APIMediaSurfacesINTEL
  2. Instead of clGetDeviceIDs, use the clGetDeviceIDsFromVA_APIMediaAdapterINTEL function to get devices corresponding to the VA adapter.
  3. Pass VA adapter handle into clCreateContext through the properties argument.
  4. Acquire the shared VA surfaces prior to OpenCL kernel execution and release the surfaces after the kernel completes the execution.

Check VA Sharing Capabilities and Get OpenCL™ Functions

Assuming that that the OpenCL platform is already selected (see “Platform/Device Capabilities Viewer” sample for an example of how to select an OpenCL platform), this tutorial also assumes that the OpenCL platform supports the VA Media Sharing extension. First step to check that the extension is supported - is to use the clGetPlatformInfo function.

size_t  len = 0; // length for extension string

// get size for string with extension names
clGetPlatformInfo(
	clPlatform,		    // the OpenCL platform to check
	CL_PLATFORM_EXTENSIONS, // the property name
	0,NULL,
	&len);			    // buffer length required for extension names

// allocate buffer to store extensions names 
char* str = (char*)malloc(len);

//get string with extension names
clGetPlatformInfo(
    clPlatform,		 // the OpenCL platform to check
    CL_PLATFORM_EXTENSIONS, // the platform property name
    len,            // allocated buffer length
    str,            // allocated buffer to store extension names
    NULL);

The str is a char buffer that is filled by a null-terminated string that contains a list of space-separated extension names. Below is a sample content of str:

cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_motion_estimation cl_intel_subgroups cl_intel_va_api_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir”.

If the extensions string has cl_intel_va_api_media_sharing entry, then the platform supports the extension and you can use this platform for media sharing.

Next step is to get media sharing function pointers. The following functions are required to use the media sharing

  • clGetDeviceIDsFromVA_APIMediaAdapterINTEL enables you to get OpenCL devices corresponding to the VA adapter.
  • clCreateFromVA_APIMediaSurfaceINTEL creates an OpenCL 2D image object from a VA media surface. The new image object shares the same data as the media surface and is valid as long as the corresponding VA surface exists.
  • clEnqueueAcquireVA_APIMediaSurfacesINTEL enables you to acquire OpenCL memory objects so that consecutive OpenCL commands, queued to a command-queue, can use the acquired memory objects.
  • clEnqueueReleaseVA_APIMediaSurfacesINTEL releases the acquired OpenCL memory objects, returning ownership to VA API.

The pointer to each function can be obtained by the clGetExtensionFunctionAddressForPlatform OpenCL call. The functions are described in the extension specification and declared in CL/va_ext.h header provided with the Intel® Media Server Studio. The example below illustrates how to get pointer for the clGetDeviceIDsFromVA_APIMediaAdapterINTEL function

#include <CL/va_ext.h>

// define variables for the extension function pointers 
clGetDeviceIDsFromVA_APIMediaAdapterINTEL_fn clGetDeviceIDsFromVA_APIMediaAdapterINTEL = NULL;
 
//get pointer to the function and store it in the variable
clGetDeviceIDsFromVA_APIMediaAdapterINTEL = (clGetDeviceIDsFromVA_APIMediaAdapterINTEL_fn)clGetExtensionFunctionAddressForPlatform(
	clPlatfrom, 							// the chosen platform
	"clGetDeviceIDsFromVA_APIMediaAdapterINTEL"	);	// the name of function

//check that returned pointer is not NULL
if(NULL == clGetDeviceIDsFromVA_APIMediaAdapterINTEL)
{
  printf(“Can’t get pointer to clGetDeviceIDsFromVA_APIMediaAdapterINTEL”);
  return EXIT_FAILURE;
}

For convenience purposes consider keeping the pointer name equal to the function name. You can query the rest of the function pointers similarly.

Creating VA Sharing OpenCL Context

Once all media sharing functions are available through pointers, you can create the OpenCL context using the device that corresponds to the VA adapter. To enumerate such devices, use the clGetDeviceIDsFromVA_APIMediaAdapterINTEL function. You can use this function similarly to an ordinary OpenCL clGetDeviceIDs routine except the fact that a VA adapter and additional VA-related flags are used as input parameters. The code below uses CL_PREFERRED_DEVICES_FOR_VA_API_INTEL flag to get a list of devices that is preferred for media sharing.

cl_int        err = CL_SUCCESS;			//error code
cl_device_id* pDevices = NULL;	//buffer for device ids
cl_uint       nDevices = 0;	//number of devices

// get number of devices corresponding to VA adapter
err = clGetDeviceIDsFromVA_APIMediaAdapterINTEL(
	clPlatform, 			// the selected OpenCL platform
	CL_VA_API_DISPLAY_INTEL,   // media adapter type
    	vaDisplay,			// media adapter handle
	CL_PREFERRED_DEVICES_FOR_VA_API_INTEL, // set of OpenCL devices
	NULL,NULL,
	&nDevices); // pointer to variable to store number of devices

if(CL_SUCCESS != err || 0 == nDevices)
{
	printf("Can’t get OpenCL device for media sharing");
	return EXIT_FAILURE;
}

// allocate buffer for device ids
pDevices = (cl_device_id*)malloc(sizeof(cl_device_id) * nDevices);

// get device ids corresponding to VA adapter
err = clGetDeviceIDsFromVA_APIMediaAdapterINTEL(
	clPlatform, 			// the chosen platform
	CL_VA_API_DISPLAY_INTEL,   // media adapter type
    	vaDisplay,			// media adapter
	CL_PREFERRED_DEVICES_FOR_VA_API_INTEL, // set of opencl devices
	nDevices, 			// size of output pDevices buffer
	pDevices,			// pointer to output buffer
	NULL);
//check error here …

After calling the function, the pDevices buffer contains OpenCL device IDs that correspond to a given vaDisplay adapter. If there is one or more devices corresponding to VA adapter, then you can create the OpenCL context for media sharing. Create the context in a common way via the clCreateContext function, but pass an additional property (for example, CL_CONTEXT_VA_API_DISPLAY_INTEL). The example below creates context for all devices returned by clGetDeviceIDsFromVA_APIMediaAdapterINTEL.

cl_int 	err;		// error code
cl_context	clContext; 	// context object

// the properties list that is used to create context
cl_context_properties props[] = { 
	// VA adapter
	CL_CONTEXT_VA_API_DISPLAY_INTEL, (cl_context_properties) vaDisplay, 
	// declare that user does not give guarantee that any VA work is finished
       // before clEnqueueAcquireVA_APIMediaSurfacesINTEL will be called
	CL_CONTEXT_INTEROP_USER_SYNC, CL_FALSE, 
	0};

//create context
clContext = clCreateContext(
	props, 	// properties with VA adapter
	nDevices, 	// number of devices used for context creation
	pDevices,	// buffer with device IDs
	NULL, NULL,  // callback function is not used here
	&err);		// returned error
// check error

The properties props is a list of pairs of property names and their corresponding values. The property list in this example has two components:

  • A VA adapter vaDisplay that you should pass into clCreateContext to create a media sharing-enabled context.
  • The CL_CONTEXT_INTEROP_USER_SYNC parameter. If this parameter is CL_FALSE then clEnqueueAcquireVA_APIMediaSurfacesINTEL guarantees that all VA calls finish before clEnqueueAcquireVA_APIMediaSurfacesINTEL finishes. If the parameter is CL_TRUE then you have to make sure that all VA calls finish before clEnqueueAcquireVA_APIMediaSurfacesINTEL.

Creating VA Sharing OpenCL Image

At this stage you can create cl_mem (image) object from VA surface via clCreateFromVA_APIMediaSurfaceINTEL function for the shared OpenCL context.

This function accepts an ID of the VA surface that is used as a base for the corresponding OpenCL image. The code below shows how to create four images for Y and UV planes for input and output VA_FOURCC_NV12 VA surfaces. NV12 surface has two planes. The first one is a full-sized (width,height) one-channel Y plane with an intensity component, the second one is half-sized (width/2,height/2) two-channel UV plane with color components.

// create input 2D images to share Y and UV planes for VA input surface
clImageInY = clCreateFromVA_APIMediaSurfaceINTEL(
	clContext, 		// context created for media sharing
	CL_MEM_READ_ONLY, 	// create 2D image for reading (by OpenCL kernels)
	vaSurfaceIn,		// input VA surface for sharing
	0,			// Create 2D image for Y plane for NV12 format
	&err);			// error code
// check error here…
clImageInUV = clCreateFromVA_APIMediaSurfaceINTEL(
	clContext, 		// context created for media sharing
	CL_MEM_READ_ONLY, 	// create 2D image for reading (by OpenCL kernels)
	vaSurfaceIn,		// input VA surface for sharing
	1,			// Create 2D image for UV plane for NV12 format
	&err);			// error code
//check err here

// create output 2D images to share Y and UV planes for VA output surface
clImageOutY = clCreateFromVA_APIMediaSurfaceINTEL(
	clContext, 		// context created for media sharing
	CL_MEM_WRITE_ONLY,	// create 2D image for writing (by OpenCL kernels)
	vaSurfaceOut,	       // output VA surface for sharing
	0,			// Create 2D image for Y plane for NV12 format
	&err);			// error code
// check error here…
clImageOutUV = clCreateFromVA_APIMediaSurfaceINTEL(
	clContext, 		// context created for media sharing
	CL_MEM_WRITE_ONLY,	// create 2D image for writing (by OpenCL kernels)
	vaSurfaceOut,	// output VA surface for sharing
	1,			// Create 2D image for UV plane for NV12 format
	&err);			// error code
//check err here

Here vaSurfaceIn and vaSurfaceOut are input and output NV12 VA surfaces to be shared as OpenCL 2D image objects. The clImageInY, clImageInUV and clImageOutY, clImageOutUV are new OpenCL image objects that are based on the vaSurfaceIn and vaSurfaceOut.

  1. clImageInY shares Y plane of input VA surface
  2. clImageInUV shares UV plane of input VA surface
  3. clImageOutY shares Y plane of output VA surface
  4. clImageOutUV shares UV plane of output VA surface

The CL_MEM_READ_ONLY flag is used for input images as only read operations are made for these images in the OpenCL kernels. To create output images for writing use the CL_MEM_WRITE_ONLY flag. To create image for both reading and writing use the CL_MEM_READ_WRITE flag.

Using VA Sharing OpenCL Image

Upon creating OpenCL memory objects from VA surfaces, you can use any regular OpenCL kernel that operates on images to process the VA surfaces. Note that NV12 surface has Y plane with (width, height) sizes and UV half-sized plane (width/2, height/2). So UV plane has one color pair for four neighboring pixels. Consequently, an OpenCL kernel that processes at least a 2x2 pixel block is more suitable for NV12 format and the created images.

One OpenCL Work Item

For example, consider the following kernel:

//some predefined parameters for color correction
#define SCALE_COLOR     0.75f
#define SCALE_INTENCITY 1.5f
// define simple sampler
const sampler_t smp = CLK_FILTER_NEAREST;

// There is nothing special to process shared surfaces
// so the ordinary OpenCL kernel that process image object is used.
// This kernel makes a simple color correction using the predefined parameters
__kernel void ProcessYUV(
    __read_only image2d_t srcY, //input Y plane
    __read_only image2d_t srcUV,//input UV plane
    __write_only image2d_t dstY,//output Y plane
    __write_only image2d_t dstUV)//output UV plane
{
    const int x = get_global_id(0);
    const int y = get_global_id(1);
    // single coordinate for single UV value
    const int2    coordUV = (int2)(x,y);
    // four coordinates for four Y values
    const int2    coordY[4] = {
        (int2)(2*x,2*y),
        (int2)(2*x+1,2*y),
        (int2)(2*x,2*y+1),
        (int2)(2*x+1,2*y+1)
    };

    float4 UV,Y[4];
    // read UV plane component
    UV = read_imagef(srcUV, smp, coordUV);
    // read 4 intensity components from input Y plane
    Y[0] = read_imagef(srcY, smp, coordY[0]);//read Y plane 1 component
    Y[1] = read_imagef(srcY, smp, coordY[1]);//read Y plane 2 component
    Y[2] = read_imagef(srcY, smp, coordY[2]);//read Y plane 3 component
    Y[3] = read_imagef(srcY, smp, coordY[3]);//read Y plane 4 component

    // process single color component
    UV.xy = (UV.xy-0.5f)*SCALE_COLOR+0.5f;
    // process four intensity components 
    Y[0].x = (Y[0].x-0.5f)*SCALE_INTENCITY+0.5f;
    Y[1].x = (Y[1].x-0.5f)*SCALE_INTENCITY+0.5f;
    Y[2].x = (Y[2].x-0.5f)*SCALE_INTENCITY+0.5f;
    Y[3].x = (Y[3].x-0.5f)*SCALE_INTENCITY+0.5f;
    
    // write new color component into output UV plane
    write_imagef(dstUV, coordUV, UV);
    // write 4 new intensity components into output Y plane
    write_imagef(dstY, coordY[0],Y[0]);
    write_imagef(dstY, coordY[1],Y[1]);
    write_imagef(dstY, coordY[2],Y[2]);
    write_imagef(dstY, coordY[3],Y[3]);
}

This kernel makes simple color and intensity corrections. The result of such processing is shown in the following pictures. The left one is an input image and the right one is an output image.

color correction

Before enqueueing this kernel with the regular clEnqueueNDRangeKernel function, call the clEnqueueAcquireVA_APIMediaSurfacesINTEL function to schedule acquiring of the VA surfaces. Then you can submit the kernel into the queue for execution. You can submit several kernels that process the acquired objects. Once you submit all the necessary kernels, call MclEnqueueReleaseVA_APIMediaSurfacesINTEL to release the acquired VA surfaces. The code example below shows how to submit the kernel for execution to process OpenCL images that share VA surfaces.

// the processed shared VA surfaces has to be acquired
// before submit OpenCL kernel to process corresponding images.
// place all image objects into single buffer to acquire all of them 
cl_mem    images[4] ={
    clImageInY,     //input image with Y plane of VA surface
    clImageInUV,    //input image with UV plane of VA surface
    clImageOutY,    //output image with Y plane of VA surface
    clImageOutUV    //output image with UV plane of VA surface
};

//no need for explicit sync with VA before AcquireVA
//as the context was created with CL_CONTEXT_INTEROP_USER_SYNC==CL_FALSE 

clEnqueueAcquireVA_APIMediaSurfacesINTEL(
    clQueue,    // queue for media sharing device
    4,          // number of images in "images" buffer
    images,     // buffer with images to be processed
    0,  NULL, NULL); // no any events

// set arguments for kernel that 
// will process acquired VA surface through images 
clSetKernelArg(0,sizeof(cl_mem),&clImageInY);
clSetKernelArg(1,sizeof(cl_mem),&clImageInUV);
clSetKernelArg(2,sizeof(cl_mem),&clImageOutY);
clSetKernelArg(3,sizeof(cl_mem),&clImageOutUV);

// Enqueue kernel
// setup global size to execute image processing kernel
// NV12 has Y plane with (width,heigh) size
// and half sized UV plane. It is convenient to process 
// 2x2 pixel block (with the same chroma) by one workitem.
// So the used global size is with/2 by height/2
size_t GS[2] = {width/2,height/2};
clEnqueueNDRangeKernel(
    clQueue,    // queue for media sharing device
    clKernel,   // OpenCL kernel
    2,          // GS has 2 elements 
    NULL,       // no offset
    GS,         // global sizes for processing
    NULL,       // use default local size
    0, NULL, NULL);//no events

// release images that were acquired for OpenCL kernel processing
clEnqueueReleaseVA_APIMediaSurfacesINTEL(
    clQueue,    // queue for media sharing device
    4,          // number of images in "images" buffer
    images,     // buffer with processed images
    0,  NULL, NULL); // no any events

References

  1. Video Acceleration API
  2. OpenCL™ Platform/Device Capabilities Viewer Sample
  3. cl_intel_dx9_media_sharing OpenCL™ platform extension
  4. Intel® Media Server Studio 2015
  5. cl_intel_va_api_media_sharing OpenCL platform extension

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.