Unified Shared Memory

Terminology

Unified Addressing

Guarantees that all devices use a unified address space. Pointer values in the unified address space always refer to the same location in memory. The unified address space encompasses the host and one or more devices. Note that this does not require addresses in the unified address space to be accessible on all devices, only that pointer values are consistent.

Unified Shared Memory

Unified Shared Memory (USM) is a capability that, when available, provides the ability to create allocations that are visible to both host and devices. USM builds on Unified Addressing to define a shared address space in which pointer values always refer to the same location in memory. USM defines multiple tiers of increasing capability described in the following sections:

  • Explicit USM
  • Restricted USM
  • Concurrent USM
  • System USM

Note

All utility functions described below are located in the sycl namespace unless otherwise indicated.
Explicit USM

Defines capabilities for explicitly managing device memory. Programmers directly allocate device memory, and data must be explicitly copied between the host and a device. Device allocations are obtained through a SYCL device allocator instead of the system allocator. Device allocations are not accessible on the host, but the pointer values remain consistent due to Unified Addressing.

Restricted USM

Defines capabilities for implicitly sharing data between host and devices. However, Restricted USM, as the name implies, is limited in that host and device may not concurrently compute on memory in the shared address space. Restricted USM builds on Explicit USM by adding two new types of allocations, host and shared. Allocations are obtained through SYCL allocator instead of the system allocator. shared allocations may be limited by device memory.

Concurrent USM

Builds on Restricted USM by enabling concurrent access to shared allocations between host and devices. Additionally, some implementations may support a working set of shared allocations larger than device memory.

System USM

System USM extends on the previous tiers by performing all shared allocations with the normal system memory allocation routines. In particular, programmers can now use malloc or C++ new instead of sycl_malloc to create shared allocations. Likewise, free and delete are used instead of sycl::free. Note that host and device allocations are unaffected by this change and must still be allocated using their respective USM functions.

USM Allocations

Unified Shared Memory allocation types:

namespace sycl {
  namespace usm {
    enum class alloc {
      host,
      device,
      shared,
      unknown
    };
  }
}
Allocation type Description
host Allocations in host memory that are accessible by a device.
device Allocations in device memory that are not accessible by the host.
shared Allocations in shared memory that are accessible by both host and device.

C++ Allocator Interface

template <typename T, usm::alloc AllocKind, size_t Alignment = 0>
class usm_allocator {
public:
  using value_type = T;
  using pointer = T *;
  using const_pointer = const T *;
  using reference = T &;
  using const_reference = const T &;

public:
  template <typename U> struct rebind {
    typedef usm_allocator<U, AllocKind, Alignment> other;
  };

  usm_allocator() = delete;
  usm_allocator(const context &ctxt, const device &dev);
  usm_allocator(const queue &q);
  usm_allocator(const usm_allocator &other);

  // Construct an object
  // Note: AllocKind == alloc::device is not allowed
  template <
      usm::alloc AllocT = AllocKind,
      typename std::enable_if<AllocT != usm::alloc::device, int>::type = 0>
  void construct(pointer Ptr, const_reference Val);

  template <
      usm::alloc AllocT = AllocKind,
      typename std::enable_if<AllocT == usm::alloc::device, int>::type = 0>
  void construct(pointer Ptr, const_reference Val) {
    throw feature_not_supported(
        "Device pointers do not support construct on host");
  }

  // Destroy an object
  // Note:: AllocKind == alloc::device is not allowed
  template <
      usm::alloc AllocT = AllocKind,
      typename std::enable_if<AllocT != usm::alloc::device, int>::type = 0>
  void destroy(pointer Ptr);

  template <
      usm::alloc AllocT = AllocKind,
      typename std::enable_if<AllocT == usm::alloc::device, int>::type = 0>
  void destroy(pointer Ptr) {
    throw feature_not_supported(
        "Device pointers do not support destroy on host");
  }

  // Note:: AllocKind == alloc::device is not allowed
  template <
      usm::alloc AllocT = AllocKind,
      typename std::enable_if<AllocT != usm::alloc::device, int>::type = 0>
  pointer address(reference Val);

  template <
      usm::alloc AllocT = AllocKind,
      typename std::enable_if<AllocT == usm::alloc::device, int>::type = 0>
  pointer address(reference Val) const {
    throw feature_not_supported(
        "Device pointers do not support address on host");
  }

  template <
      usm::alloc AllocT = AllocKind,
      typename std::enable_if<AllocT != usm::alloc::device, int>::type = 0>
  const_pointer address(const_reference Val);

  template <
      usm::alloc AllocT = AllocKind,
      typename std::enable_if<AllocT == usm::alloc::device, int>::type = 0>
  const_pointer address(const_reference Val) const {
    throw feature_not_supported(
        "Device pointers do not support address on host");
  }

  // Allocate memory
  pointer allocate(size_t Size);

  // Deallocate memory
  void deallocate(pointer Ptr, size_t size);
};

Utility Functions

While the modern C++ usm_allocator interface is sufficient for specifying USM allocations and deallocations, many programmers may prefer C-style malloc-influenced APIs. As a convenience to programmers, malloc-style APIs are also defined. Additionally, other utility functions are specified to perform various operations such as memory copies and initializations as well as to provide performance hints.

Explicit USM

The following are explicit functions for unified shared memory:

Restricted USM

Restricted USM includes all the utility functions of Explicit USM. It additionally introduces new functions to support host and shared allocations.

The following are the additional restricted functions for unified shared memory:

Concurrent USM

Concurrent USM contains all the utility functions of Explicit USM and Restricted USM. It also introduces a new function, sycl::queue::mem_advise, that enables programmers to provide additional information to the underlying runtime about how different allocations are used.

General USM

The following are general functions for unified shared memory:

Informational USM

The following are informational functions for unified shared memory.

Device Information Descriptors

The following Unified Shared Memory device information descriptors are all of type bool:

info::device::usm_device_allocations

Returns true if this device supports device allocations as described in Explicit USM.

info::device::usm_host_allocations

Returns true if this device can access host allocations.

info::device::usm_shared_allocations

Returns true if this device supports shared allocations as described in Restricted USM and Concurrent USM. The device may support Restricted USM, Concurrent USM, or both.

info::device::usm_restricted_shared_allocations

Returns true if this device supports shared allocations as governed by the restrictions described in Restricted USM on the device. This property requires that property usm_shared_allocations returns true for this device.

info::device::usm_system_allocator

Returns true if the system allocator may be used instead of SYCL USM allocation mechanisms for shared allocations on this device as described in System USM.

DAGs Without Accessors

Unified Shared Memory changes how the SYCL runtime manages data movement. Because the runtime might no longer be responsible for orchestrating data movement, it makes sense to enable a way to build dependence graphs based on ordering computations rather than accesses to data inside them. Conveniently, a SYCL queue already returns an event on calls to submit. These events can be used by the programmer to wait for the submitted task to complete.

For example:

queue q;
auto dev = q.get_device();
auto ctxt = q.get_context();
float* a = static_cast<float*>(malloc_shared(10*sizeof(float), dev, ctxt));
float* b = static_cast<float*>(malloc_shared(10*sizeof(float), dev, ctxt));
float* c = static_cast<float*>(malloc_shared(10*sizeof(float), dev, ctxt));

auto e = q.submit([&](handler& cgh) {
  cgh.parallel_for<class vec_add>(range<1> {10}, [=](id<1> ID) {
    size_t i = ID[0];
    c[i] = a[i] + b[i];
  });
});
e.wait();

Coarse Grain DAGs with cgh.depends_on

While SYCL already defines the capability to wait on specific tasks, programmers should still be able to easily define relationships between tasks.

class handler {
 ...
 public:
  ...
  void depends_on(event e);
  void depends_on(const vector_class<event> &e);
};
Parameter e
Event or vector of events representing tasks required to complete before this task can begin.
Return value
None