Overview: Asynchronous Offloading

This topic only applies to Intel® 64 and IA-32 architectures targeting Intel® Graphics Technology.

Synchronous and Asynchronous Offloading

The compiler provides two heterogeneous offload programming models that enable you to use the processor graphics:

  • Synchronous offload:

    • Access this model using a parallel _Cilk_for loop as the parallel loop under #pragma offload.

    • The CPU waits for the offload task to complete before continuing execution.

    • The compiler handles data sharing and kernel creation based on an offload region containing a parallel _Cilk_for loop.

  • Asynchronous offload:

    • Access this model using an API.

    • The CPU continues execution until it is requested to wait for a kernel to complete.

    • You have more control over data sharing and kernel enqueueing. Data sharing and kernel enqueueing are separate, so multiple kernels can share data.

About the Asynchronous Offload API

The Intel® Graphics Technology runtime and the gfx_rt.h header file provide an Asynchronous API to organize queued offload of user-defined kernel functions and data sharing between the CPU and prodcessor graphics, with little extra programming effort. You can use this API in conjunction with named or direct kernels written using _Cilk_for as the parallel loop in the kernel entry point.

The API includes the following functions:



GfxTaskId _GFX_enqueue

Putting the task into the in-order offload queue


Waiting for task completion



Managing shared linear data

GfxImage2D (C++ interface, class constructor)

GfxSharedImage2D (C++ interface, class constructor)

GfxResourceHandle _GFX_create_image_2d (C interface)

_GFX_close_resource_handle (C interface)

Creating and destroying 2D imagesfor processor graphics operations

GfxImage2D::write, (C++ interface)

GfxSharedImage2D::write, (C++ interface)

GfxImage2::read (C++ interface)

GfxSharedImage2::read (C++ interface)

_GFX_read_image_2d (C interface)

_GFX_write_image_2d (C interface)

Synchronizing the content of 2D images between CPU and GPU

For more complete information about compiler optimizations, see our Optimization Notice.