Ahead of Time Compilation
Ahead of Time (AOT) Compilation is a feature that is helpful in your development life cycle or distribution time, when you know beforehand what your target device is going to be at application execution time. The AOT feature provides the following benefits:
- No additional compilation time is done when running your application.
- No just-in-time (JIT) bugs encountered due to compilation for the target device, because this step is skipped with AOT compilation.
- Your final code, executing on the target device, can be tested as-is before you deliver it to end-users.
The program built with AOT compilation for a specific target device will not run on an non-specific device. You must detect the proper target device at runtime and report an error if the targeted device is not present. The use of exception handling with an asynchronous exception handler is recommended.
Data Parallel C++ (
supports AOT compilation for the following targets: Intel® CPUs, Intel® Processor Graphics (Gen9 or above), and Intel® FPGA.
DPC++
) Prerequisites
To use the AOT feature for targeting a GPU, you must have the OCLOC tool installed. Refer to the
Intel® oneAPI Toolkit Installation Guide's section: Install OpenCL™ Offline Compiler (OCLOC) to install the tool on your operating system.
How to Use AOT for the Target Device (Intel® CPUs)
The supported options are:
- -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice
- -Xs "-march=<arch>", where<arch>is one of the following:
- avx512(Enables the support of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) foundation, conflict detection, doubleword and quadword, byte and word, vector length extensions for Intel® Architecture Processors, and instructions enabled with-march=avx2)
- avx2(Enables the support of Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions (Intel® AVX), Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2), Intel® SSE4.1, Intel® SSE3, Intel® SSE2, Intel® SSE, and Supplemental Streaming SIMD Extensions 3 (SSSE3) instructions)
- avx(Enables the support of Intel® AVX, Intel® SSE4.2, Intel® SSE4.1, Intel® SSE3, Intel® SSE2, Intel® SSE, and SSSE3 instructions)
- sse4.2(Enables the support of Intel® SSE4.2 efficient accelerated string and text Processing Instructions, Intel® SSE4 vectorizing compiler and media accelerator, Intel® SSE3, Intel® SSE2, Intel® SSE, and SSSE3 instructions)
Examples:
- Linux*:dpcpp -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice -Xs "-march=avx2" main.cpp
- Windows*:dpcpp /EHsc -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice -Xs "-march=avx2" test_cpu.cpp
Building an Application with Multiple Source Files for CPU Targeting
Method 1:
Compile your normal files (with no
DPC++
- Linux:
- dpcpp -c main.cpp
- dpcpp -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice -Xs "-march=avx2" mandel.cpp main.o
- Windows:
- dpcpp -c /EHsc main.cpp
- dpcpp /EHsc -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice -Xs "-march=avx2" mandel.cpp -link main.obj
Method 2:
Compile the file with the kernel code first, and create a fat object. Then compile the rest of the files, and do the linking in one command line to create a fat executable:
- Linux:
- dpcpp -c -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice -Xs "-march=avx2" mandel.cpp
- dpcpp main.cpp mandel.o
- Windows:
- dpcpp -c /EHsc -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice -Xs "-march=avx2" mandel.cpp
- dpcpp /EHsc main.cpp mandel.obj
Currently, Method 2 only works on a HOST selector.
How to Use AOT for Intel® Integrated Graphics (Gen9 or Above)
The supported options are:
- -fsycl-targets=spir64_gen-unknown-unknown-sycldevice
- -Xs '-device <arch>'option, where<arch>is one of the following:Only the common architectures are listed.
- Gen12LP(for all DG1-based graphics)
- dg1(for Intel® Iris® Xe MAX)
- Gen9(for all Gen9-based graphics)
- cfl(Coffee Lake with Gen9 graphics)
- glk(Gemini Lake with Gen9 graphics)
- icllp(Ice Lake with Gen11 graphics)
- kbl(Kaby Lake with Gen9 graphics)
- lkf(Lakefield with Gen11 graphics)
- skl(Intel® microarchitecture code name Skylake with Gen9 graphics)
To see all the device types, use the following command:
ocloc compiler --help
If multiple target devices are listed, the
Intel® oneAPI
compiles for each of these targets and creates a fat-binary that contains all of the device binaries produced this way.
DPC++/C++
CompilerExamples of supported
-device
patterns:
- Linux:
- To compile for a single target, usingsklas an example, use:dpcpp -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xs '-device skl' vector-add.cpp
- To compile for two targets, usingsklandicllpas examples, use:dpcpp -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xs '-device skl,icllp' vector-add.cpp
- To compile for all the targets known to OCLOC, use:dpcpp -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xs '-device *' vector-add.cpp
- Windows:
- To compile for a single target, usingsklas an example, use:dpcpp -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xs '-device skl' vector-add.cpp
- To compile for two targets, usingsklandicllpas examples, use:dpcpp -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xs '-device skl,icllp' vector-add.cpp
- To compile for all the targets known to OCLOC, use:dpcpp -fsycl-targets=spir64_gen-unknown-unknown-sycldevice -Xs '-device *' vector-add.cpp