Use these courses to get up to speed on oneAPI Data Parallel C++ code and how to use oneAPI Toolkits and components to achieve cross-platform, heterogenous compute.
|Introducing oneAPI: A Unified, Cross-Architecture Performance Programming Model||Mandatory|
|Intel® DevCloud Tutorial||Mandatory|
|Data Parallel C++ Program Structures||Mandatory|
|Data Parallel C++ New Features||Mandatory|
|Develop in a Heterogeneous Environment with Intel® oneAPI Math Kernel Library||Optional|
|Intel® oneAPI Threading Building Blocks: Optimizing for NUMA Architectures||Optional|
The drive for compute innovation is as old as computing itself, with each advancement built upon what came before. In 2019 and 2020, a primary focus of next-gen compute innovation has been to enable increasingly complex workloads to run on multiple architectures, including CPUs, GPUs, FPGAs, and AI accelerators.
Historically, writing and deploying code for a CPU and a GPU or other accelerator has required separate code bases, libraries, languages, and tools. oneAPI was created to solve this challenge.
Kent Moffat, software specialist and Intel senior product manager, presents:
Develop, run, and optimize your Intel® oneAPI solution in the Intel® DevCloud—a free development sandbox to learn about and program oneAPI cross-architecture applications. Get full access to the latest Intel CPUs, GPUs, and FPGAs, Intel® oneAPI Toolkits, and the new programming language, Data Parallel C++ (DPC++).
Some of the lessons and training materials use the Intel DevCloud as a platform to host the training and to practice what you've learned.
This module introduces DPC++ program structure and focuses on important SYCL* classes to write basic DPC++ code to offload to accelerator devices.
This module introduces some of the new extensions added to DPC++ like Unified Shared Memory (USM), in-order queues, and Sub-Groups. This module will be updated when new extensions are added to the public releases.
Peter Caday, math algorithm engineer at Intel, discusses how oneMKL enables developers to program with GPUs beyond the traditional CPU-only support.
Threading Building Blocks (TBB) is a high-level C++ template library for parallel programming that was originally developed as a composable, scalable solution for multicore platforms. Separately, in the realm of high-performance computing, multisocket Non-Uniform Memory Access (NUMA) systems are typically used with OpenMP*.
Increasingly, many independent software components require parallelism within a single application, especially in AI and video processing and rendering domains. In such environments, performance may degrade without allowing for composability with other components.
The result is that many developers have pulled TBB into NUMA environments—a complex task for even the most seasoned programmers.
Intel is working to simplify the approach. This training:
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804