Tutorial

  • 0.09
  • 09/09/2020
  • Public Content

Explore
DPC++
with Samples from Intel

Introduction to
DPC++

Data Parallel C++ (DPC++)
applications are C++ programs for parallelism.
DPC++
is designed for data parallel programming and heterogenous computing.
DPC++
provides a consistent programming language (C++) and APIs across CPU, GPU, FPGA, and AI accelerators. Each architecture can be programmed and used, either in isolation or together. This allows developers to learn once and then program for distinct accelerators. Each class of accelerator requires an appropriate formulation and tuning of the algorithms for best performance, but the language and programming model remains consistent, regardless of the target device.
DPC++
is based on SYCL*, from the Khronos* Group, to support data parallelism and heterogeneous computing.
DPC++
programs are essentially C++ programs that provide broad compatibility and familiar constructs.
DPC++
contains extensions to SYCL to enhance performance and productivity. These features are being driven into a future version of SYCL. For more details about SYCL, refer to version 1.2.1 of the SYCL Specification.
The oneAPI.com site contains more details about
DPC++
and its specifications.
This guide aims to help developers understand how to program using the
oneAPI
programming model, and how to target and optimize for the appropriate architecture to achieve optimal application performance.

Build and Run a Sample Project

The links below take you to the Get Started with the Intel® oneAPI Base Toolkit content for the Command Line and IDE:

Sample 1: Simple Device Offload Structure

Sample 1 uses Vector Add as the equivalent of a Hello, World! sample for data parallel programs. It provides the basic structure of a
DPC++
application by showing you how to target an offload device. Sample 1 provides two different source files as examples of how to manage memory, you can use buffers or Unified Shared Memory (USM).
Vector Add provides both GPU and FPGA device selectors.
In this sample, you will learn how to use the basic elements (features) of
DPC++
to offload a simple computation using 1D arrays to accelerators. The basic features are:
  • A one-dimensional array of data.
  • A device selector queue, buffer, accessor, and kernel.
  • Memory management using buffers and accessors or USM.
You can find a detailed code walkthrough on the Intel® Developer Zone.
Get the sample:

Sample 2: Basic DPC++ Features Defined

Using a two-dimensional stencil to simulate a wave propagating in a 2D isotropic medium, this sample walks you through the base tenets of
DPC++
step by step, with:
  • DPC++
    queues (including device selectors and exception handlers).
  • DPC++
    buffers and accessors.
  • The ability to call a function inside a kernel definition and pass accessor arguments as pointers. A function called inside the kernel performs a computation (it updates a grid point specified by the global ID variable) for a single time step.
You can find a detailed code walkthrough on the Intel® Developer Zone.
Get the sample:

Sample 3: Optimizing for More Complex Applications

This code sample extends the
DPC++
concepts reviewed in the previous sample, and explains how they can be leveraged to solve complex stencil computations in 3D. Moving from 2D to 3D grid sizes can uncover common GPGPU (device) programming issues that are related to inefficient data access patterns, low flops-to-byte ratios, and low occupancy. Using this code sample shows you how
DPC++
features can be used to tackle those underlying issues and optimize your performance. This sample includes:
  • DPC++
    local buffers and accessors (declare local memory buffers and accessors to be accessed and managed by each
    DPC++
    workgroup).
  • Code for Shared Local Memory (SLM) optimizations.
  • DPC++
    kernels (including
    parallel_for
    function and
    nd-range<3>
    objects):
    • DPC++
      queues (including custom device selector and exception handlers).
Get the sample:

Sample 4: Introducing Synchronization

Compared to Samples 2 and 3 (stencil kernels on regular grids), this sample adds some complexity in the form of a large number of moving particles and their interaction with a fixed grid of cells. This is used to illustrate new
DPC++
features like: Synchronization (atomic operations) and others.
Using this code sample shows you how to offload to an accelerator a computation that uses the following
DPC++
tools:
  • DPC++
    queues (including device selectors and exception handlers).
  • DPC++
    buffers and accessors (communicate data between the host and the device).
  • DPC++
    kernels (including
    parallel_for
    function and
    range<1>
    objects).
  • DPC++
    atomic operations for synchronization.
  • API-based programming: Use
    oneMKL
    to generate random numbers.
Get the sample:

Next Steps

Code Walkthroughs
Next, try a detailed code walkthrough on the following topics:
Determine Which Code to Offload
You can determine which parts of your code benefit from offloading to an accelerator with Intel® Advisor. The Offload Advisor feature allows you to collect performance predictor data, in addition to the standard profiling capabilities. It determines what code can be offloaded to a target device, which accelerates the performance of your CPU-based applications. The Get Started with Intel® Advisor helps you:
  • Optimize CPU or GPU code for memory and computes with Roofline Analysis.
  • Enable more vector parallelism and improve its efficiency.
  • Model, tune, and test multiple threading designs.
  • Create and analyze data flow and dependency-computation using heterogeneous algorithms.
Transform CUDA* Code into DPC++ Code
You can transform CUDA code into a standards-based
DPC++
code with a migration engine called the
Intel® DPC++ Compatibility Tool
. The Get Started Guide and User Guide help you migrate your existing CUDA applications and cover the general workflow of the migration process. The tool can be used to transform programs that are composed of multiple source and header files. It also includes:
  • One-time-only migration ports for both kernels and API calls.
  • An inline comments guide used to produce output, which can be compiled with the
    Intel® oneAPI DPC++/C++ Compiler
    .
  • Command-line tools and IDE plug-ins that streamline operations.
Additional Resources
Access a wide range of tutorials, videos, and webinar replays to learn more about DPC++ and the supporting tools on the Intel® oneAPI Toolkits Training site.
Document
Description
Learn about oneAPI and
DPC++
, programming models, programming interfaces,
DPC++
runtimes, APIs, and software development processes.
Look through our Get Started Guides for more in-depth information.
Look through our Tutorials for more in-depth information.

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804