Programming Guide

Contents

Data Parallel C++ (DPC++)

Data Parallel C++ (
DPC++
)
is a high-level language designed for data parallel programming productivity.

Simple DPC++ Sample Code

The best way to introduce
DPC++
is through an example. Since
DPC++
is based on modern C++, this example uses several features that have been added to C++ in recent years, such as lambda functions and uniform initialization. Even if developers are not familiar with these features, their semantics will become clear from the context of the example. After gaining some experience with
DPC++
, these newer C++ features will become second nature.
The following application sets each element of an array to the value of its index, so that a[0] = 0, a[1] = 1, etc.
#include <CL/sycl.hpp> #include <iostream> constexpr int num=16; using namespace sycl; int main() { auto r = range{num}; buffer<int> a{r}; queue{}.submit([&](handler& h) { accessor out{a, h}; h.parallel_for(r, [=](item<1> idx) { out[idx] = idx; }); }); host_accessor result{a}; for (int i=0; i<num; ++i) std::cout << result[i] << "\n"; }
The first thing to notice is that there is just one source file: both the host code and the offloaded accelerator code are combined in a single source file. The second thing to notice is that the syntax is standard C++: there aren't any new keywords or pragmas used to express the parallelism. Instead, the parallelism is expressed through C++ classes. For example, the
buffer
class on line 8 represents data that will be offloaded to the device, and the
queue
class on line 11 represents a connection from the host to the accelerator.
The logic of the example works as follows. Lines 8 and 9 create a buffer of 16
int
elements, which have no initial value. This buffer acts like an array. Line 11 constructs a
queue
, which is a connection to an accelerator device. This simple example asks
DPC++
to choose a default accelerator device, but a more robust application would probably examine the topology of the system and choose a particular accelerator. Once the queue is created, the example calls the
submit()
member function to submit work to the accelerator. The parameter to this
submit()
function is a lambda function, which executes immediately on the host. The lambda function does two things. First, it creates an
accessor
on line 12, which can writing elements in the buffer. Second, it calls the
parallel_for()
function on line 13 to execute code on the accelerator.
The call to
parallel_for()
takes two parameters. One parameter is a lambda function, and the other is the
range
object "
r
" that represents the number of elements in the buffer.
DPC++
arranges for this lambda to be called on the accelerator once for each index in that range, i.e. once for each element of the buffer. The lambda simply assigns a value to the buffer element by using the
out
accessor that was created on line 12. In this simple example, there are no dependencies between the invocations of the lambda, so
DPC++
is free to execute them in parallel in whatever way is most efficient for this accelerator.
After calling
parallel_for()
, the host part of the code continues running without waiting for the work to complete on the accelerator. However, the next thing the host does is to create a
host_accessor
on line 18, which reads the elements of the buffer.
DPC++
knows this buffer is written by the accelerator, so the
host_accessor
constructor (line 18) blocks until the work submitted by the
parallel_for()
is complete. Once the accelerator work completes, the host code continues past line 18, and it uses the
out
accessor to read values from the buffer.

Additional DPC++ Resources

This introduction to
DPC++
is not meant to be a complete tutorial. Rather, it just gives you a flavor of the language. There are many more features to learn, including features that allow you to take advantage of common accelerator hardware such as local memory, barriers, and SIMD. There are also features that let you submit work to many accelerator devices at once, allowing a single application to run work in parallel on many devices simultaneously.
The following resources are useful to learning and mastering
DPC++
:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.