Hello SYCL and DPC++

Published:04/02/2021

Welcome to my ramblings about software development in a world of XPUs. This is my first installment; I encourage feedback to help direct what I tackle in future installments! I have included a link, at the end of this article, to where you can leave comments and see what others have to say.

In this #xpublog post, I cover three things:

  1. What is an XPU - and why should I care?
  2. What is SYCL?, and how can it be used to make our neurotic kitten lighten up (you won't want to miss that)
  3. How can I try it out for myself using the Intel DevCloud for oneAPI (all free, easy, and very cool)

As with all my #xpublog posts, this article has key details, including links, behind what I discuss in these short videos.

What is an XPU and why care?

[embed]https://www.youtube.com/watch?v=zNP4_uqLM3E[/embed]

We have watched all of computing, in the last twenty years, become parallel computing. This has happened because of the internet, the cloud, and multicore, all becoming ubiquitous. Parallelism, and concurrency, have become every day. The next evolution is a shift to heterogeneous computing (XPUs). This shift, while well underway, is in its infancy. In time, it will reshape all of computing. Everything will be heterogenous.

XPU as a metaphor

Since much of computing is about, well, computing – our attention will shift to “processing units.” The job of a processing unit is to process data, in other words “compute.”

Two popular processing units are CPUs (central processing units) and GPUs (graphics processing units). We could generalize this to XPUs. After a quick internet search of the names APU, BPU, CPU, DPU, EPU, FPU, GPU,… through ZPU, I came to believe that (considering only the 26 letters used in the English language) it appears that only YPU might be available without overloading (there is a Yarn Processing Unit, but unit means a corporate division in that case). Therefore, when I say that FPGAs, DSPs, and ASICs, are all XPUs, you will recognize that I’m just expanding the namespace because XPU should be considered a metaphor rather than a literal definition.

Stop worrying and love all XPUs

Within the next decade, all programming will be XPU programming. Gone will be the days of writing a program with the assumption it will get all its computations done from a single type of processing unit (homogeneous computing).

This transformation is needed because computing is maturing, for many reasons, to include a great diversity of compute technologies. In a recent issue of CACM*, the trends and underlying causes are explored in an article titled "The Decline of Computers as a General Purpose Technology." I prefer to think of it as "Maturing of Computing," but perhaps that title won't sell as many magazines. (* The Decline of Computers as a General Purpose Technology, N. Thompson, and Svenja Spanuth, Communications of the ACM, March 2021, vol. 64, no. 3.)

What is SYCL? DPC++?

(lightening our neurotic kitten’s image in parallel)

[embed]https://www.youtube.com/watch?v=7Ff0JwxNlFQ[/embed]

In the video, I walk through a simple SYCL code that helps lighten up our kitten Nermal’s image.

I refer to having a video on how to sign-up for DevCloud.  That video is next in this article - and you may want to view that first if you have not signed up for DevCloud yet.

The code is all in my xpublog github repository, and it is easiest to try using a Jupyter notebook with the code already in it on DevCloud (instructions are in a later section of this article). You can also download the DPC++ compiler to run the SYCL program from either a oneAPI toolkit (get the Base Toolkit), or build it from the LLVM source (there are good instructions for building it).

SYCL is an open standard for single source C++ data parallel programming of heterogeneous hardware (I call them XPUs - see my earlier explanation). SYCL allows single-source compilation in C++ to target multiple devices on a system, rather than using C++ for the host and domain specific kernel language(s) for the device(s).

SYCL brings to C++ both kernel style programming and a mechanism to locate, query, and use accelerators in a system. Kernel based programming is an important programming style for harnessing data parallelism that is also supported in OpenCL and CUDA. An ability to enumerate and access accelerators, in a standard way, was previously introduced by OpenCL. SYCL is an effort that grew out of the OpenCL experience coupled with a desire to strongly support C++ programming. The SYCL standard was introduced by Codeplay and is managed by the Khronos group.

DPC++ is an LLVM project to implement SYCL with some extensions. DPC++ has been used to prototype many features that are now in SYCL 2020, and therefore had a head start in implementing much of SYCL 2020 even before the ink was dry on the standard. Work remains to complete alignment with the entire SYCL 2020 specification, and all the work is easy to observe in the very active open source repository. DPC++ is used by Intel to target Intel CPUs, GPUs, and FPGAs. DPC++ is used by Codeplay to target Nvidia GPUs. Other efforts, including hipSYCL, can utilize DPC++. I will discuss this more in a future post; using LLVM in support of SYCL is a popular idea that will continue to see a lot of development activity.

Find more in our book

Chapter 1 in our DPC++ & SYCL book introduces SYCL and DPC++ in more depth. Future posts will venture into topics that come later in the book. My posts aim to add more color, and updates as needed, to complement the content in our book.

SYCL and DPC++ Book - free download

These additional items are useful references especially after you have read the book:

the new SYCL 2020 Reference card (16 pages) is a handy reference;

the online DPC++ reference can be useful for interface details;

and the official SYCL 2020 language specification.

SYCL 2020 is key

The SYCL 2020 specification is the product of years of specification development from many dedicated individuals from around the industry. SYCL 2020 builds on the functionality of SYCL 1.2.1 to provide improved programmability, smaller code size and increased performance. Based on C++17, SYCL 2020 enables easier acceleration of standard C++ applications and drives a closer alignment with the ISO C++ roadmap.
 
The Khronos Group highlighted, in their SYCL 2020 announcement, seven key new SYCL enhancements:

  • Unified Shared Memory (USM) enables code with pointers to work naturally without buffers or accessors
  • Parallel reductions add a built-in reduction operation to avoid boilerplate code and achieve maximum performance on hardware with built-in reduction operation acceleration
  • Work group and subgroup algorithms add efficient parallel operations between work items.
  • Class template argument deduction (CTAD) and template deduction guides simplify class template instantiation
  • Simplified use of Accessors with a built-in reduction operation reduces boilerplate code and streamlines the use of C++ software design patterns
  • Expanded interoperability enables efficient acceleration by diverse backend acceleration APIs
  • SYCL atomic operations are now more closely aligned to standard C++ atomics to enhance parallel programming freedom

Goal = C++

The ultimate goal is to influence the C++ standard. Personally, I'm a huge fan of standardizing common practice. That makes me a huge fan of the slow, careful, and deliberate work of the C++ standards committee. After all, once its in C++, all compilers need to support it forever (approximately).

The fun is in the exploration, learning, and accomplishment that result in common practice. SYCL 2020, and implementations (there is a good overview of SYCL implementations written by Umar Arshad at ArrayFire), make it real.

Our DPC++ & SYCL book, published three months ahead of the standard is well aligned with SYCL 2020. I'll discuss some of the changes in future blogs, and the book errata (located in the book's github), over the next few months

Learn on DevCloud now

DevCloudIn addition to our book, there are some great online resources for learning SYCL and using DPC++.

Intel offers "Learn the essentials of DPC++." This training combines teaching with access to hardware (online) to try it out. The Intel DevCloud for oneAPI, is a free online resource, that give access to CPUs, GPUs, and FPGAs that all support SYCL via DPC++. Innovative use of Jupyter notebooks in the training, make this an easy and productive way to learn DPC++. You can always do it "old school" by downloading the DPC++ compiler yourself, and following book examples. Refer to our book for instructions, including pointers to getting the DPC++ compiler in binary, or building from source code.

Codeplay offers a community edition compiler, SYCL Academy, lots of code examples, blogs, and other material from which to learn. Do not miss what Codeplay has to offer!

Khronos maintains a list of additional resources for learning and using SYCL.

Last year's IWOCL / SYCLcon 2020 conference presentations/talks are all online, a rich source of information and community sharing. From last year also, the Intel oneAPI developer summit 2020 talks are all online, many focused on work using SYCL with DPC++.

The upcoming IWOCL / SYCLcon 2021, and the pre-conference oneAPI summit "at IWOCL21", offers talks, panels, and networking for anyone interested in OpenCL or SYCL. I will definitely be in attendance. I hope you attend as well, and we can bump into each other virtually.

Try now: Free, easy, and very cool

[embed]https://www.youtube.com/watch?v=6lWm3-85RbQ[/embed]

This video shows the steps to get running on DevCloud. All the links are listed in this section.

Intel’s DevCloud is a free place to develop, test, and run workloads for free on a cluster of the latest Intel hardware and software. I started using it myself while developing the DPC++ book. 

In the “Hello SYCL” video, I showed how to build and run my simple program within a Jupyter notebook.

Here is how to get access the first time, as shown in the video. These are the links that I used in the video:

  1. FIll in the web form Sign up for DevCloud
  2. Get the email with your UUID in it. Copy the UUID (a long alphanumeric string) so you can paste it into the Sign In when requested
  3. Click “Sign In” anywhere you find it on the main Training Page
  4. Scroll to the bottom of the main Training Page and click “Launch Jupyter Notebook”
  5. When you see a page that says “Server not running” – so click “Launch Server”
  6. Be patient (under a minute normally)
  7. Click the “Terminal” (under “Other”)To copy files we need, issue this command: /data/oneapi_workshop/get_jupyter_notebooks.sh
  8. Navigate to /xpublog in your navigation window (as shown in the video above)
  9. Open (double click) Welcome.ipynb
  10. Bookmark this page – it may make returning very easy.

When returning:

  1. Try your saved link, it may redirect you to the training page to sign in – if so (or any reason you don’t get the Welcome page), click “Sign In” and use your UUID again (my browsers remember my UUID – I just need to select it from a pull down in the UUID box)
  2. Scroll to the bottom of the main Training Page and click “Launch Jupyter Notebook”
  3. When you see a page that says “Server not running” – so click “Launch Server”; don't worry if you do not have to do this when returning, you will be prompted if it is not running
  4. Be patient (under a minute normally)
  5. No need to copy files this time
  6. Navigate to /xpublog in your navigation window (as shown in the video above)
  7. Open (double click) Welcome.ipynb

After xpublog has been around a while, I plan to create a streamlined page for access. I will update these instructions and the video if and when that happens.

I’ll deal with more of oneAPI in future blogs, but there is a huge amount of wonderful training available already. I recommend taking advantage of the notebooks on DevCloud, because they walk us through the topics with the ability to interact. You can edit the sample programs as you progress, to test your understanding at try out new things. For more information, visit the oneAPI DevCloud training page. I recommend starting by clicking “View training Modules” for the Intel oneAPI Base Toolkit.

Future topics – make suggestions!

Feedback welcome - Please share!I invite you to provide feedback including any suggestions on future topics. I love to dig in a bit, and offer perspectives and meaningful details that are not readily extracted from documentation but rather come from experience, and sharing between developers. I hope to offer my own software engineering version of “The Rest of the Story,” a staple created by Paul Harvey on the radio for year.

In my blogs, I expect to write about software development for XPUs (heterogeneous computing), including oneAPI, SYCL, DPC++, C++, performance, and parallel programming. Don't be surprised to see Python and Fortran in the mix, they are both great tools. With proper encouragement, I’ll dig into any topic that you find interesting and that I can help illuminate.

I look forward to your thoughts, feedback, and a good discussion.

Post comments – please!

Please post comments with your #xpublog thoughts, feedback, and suggestions on community.intel.com James-Reinders-Blog.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.