This page provides the current Release Notes for Intel® CPU Runtime for OpenCL™ Applications for Intel® Core™ and Intel® Xeon® processors. This page covers the CPU (x86-64) OpenCL™ implementation only. See the OpenCL™ Runtimes for Intel® Processors article for additional Intel® Graphics Technology information. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.
Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.
You can copy a link to a specific version's section by clicking the chain icon next to its name.
All files are in PDF format - Adobe Reader* (or compatible) required.
For OpenCL™ developer tools, visit the Intel® SDK for OpenCL™ Applications 2019 page.
For questions or technical support, visit Intel® Software Developer Support.
NOTE: For Intel Xeon® Phi™ coprocessor device support, you must install Intel MPSS version 3.3 available here. (Deprecated)
- Added subgroup support
- Supporting latest Windows including Windows* 10, Windows Server 2016* and Windows Server 2019*
- Supporting Linux* distributions including Ubuntu* 20.04 LTS, Red Hat* Enterprise Linux* 8.1, CentOS* 8.x, SUSE* 15.x
- The Intel® CPU Runtime for OpenCL™ Applications for Linux is distributed through APT and YUM repositories. Please refer to the Release Notes for installation instructions.
- Bug fixes
- Support of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) ISA on Intel® Xeon® Platinum processor (formerly code name Skylake)
- Enabled features of OpenCL™ 2.1. The product is based on a published Khronos* Specification and has passed the Khronos Conformance Process. The conformance record can be found at https://www.khronos.org/conformance/adopters/conformant-products/opencl. Refer to submission #322 recorded on October 7, 2018.
- Support for vectorization width 16 for the environment and configuration file variable CL_CONFIG_CPU_VECTORIZER_MODE, as well as for OpenCL™ C optional kernel attribute intel_vec_len_hint
- Support for OpenCL™ Kernel debugging on Linux* OS with GDB*
- Improved coexistence support with Intel® Graphics Compute Runtime for OpenCL™ Driver when both are installed.
- Changed the platform name returned via clGetPlatformInfo(...) OpenCL™ API call with CL_PLATFORM_NAME bitflag to “Intel(R) CPU Runtime for OpenCL(TM) Applications”
- New environment variable CL_CONFIG_CPU_TARGET_ARCH. It generates code exclusively for a given target CPU architecture. Allows only lowering the instruction set level supported by CPU:
Allowed values are:
Generates code for processors that support Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Foundation instructions, Intel® AVX-512 Conflict Detection instructions, Intel® AVX-512 Doubleword and Quadword instructions, Intel® AVX-512 Byte and Word instructions and Intel® AVX-512 Vector Length Extensions for Intel® processors, and the instructions enabled with core-avx2.
Generates code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2 SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
Generates code for processors that support Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
Generates code for processors that support Intel® SSE4.2 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSE3, SSE2, SSE, and SSSE3 instructions.
- Fixed an issue with user functions not being inlined in programs created using clCreateProgramWithIL(...) OpenCL™ API call
- Fixed incorrectly reported CL_DEVICE_MAX_COMPUTE_UNITS for multi-socket Intel® Xeon® systems (reported on forum https://software.intel.com/en-us/forums/opencl/topic/702240)
- Fixed incompatibility with Intel® Threading Building Blocks (Intel® TBB) max_allowed_parallelism parameter
- Fixed an issue with CL_DRIVER_VERSION returning incorrect driver version
- Improved OpenCL™ C compiler diagnostics
- Minor bug fixes
- Updated the compiler infrastructure to LLVM* version 6.0
- Intel® CPU Runtime for OpenCL™ Applications 18.1 supports CPU only. For Intel® Xeon Phi™ coprocessor support, use the version 14.2. For more information, see OpenCL™ runtime entry and release notes on the OpenCL™ driver page at: /content/www/us/en/develop/articles/opencl-drivers.html .
- New optional __attribute__((intel_vec_len_hint(<uint>)))
- This attribute can be used to provide a hint to the compiler that the kernel will perform best if vectorized to the specified vector length.
- You can specify one of the following lengths for this attribute:
||The compiler uses heuristics to decide whether to vectorize the kernel,
and if so, which vector length to use. This is the default behavior.
||No vectorization is performed by the compiler. Explicit vector data types
in kernels are left intact.
||Disables heuristics and vectorizes to the length of 4 respectively.
||Disables heuristics and vectorizes to the length of 8 respectively.
- New OpenCL™ C predefined macro __INTEL_OPENCL_CPU_<CPUSIGN>
- This macro can be used to fine tune the kernel for a specific CPU device microarchitecture. <CPUSIGN> is the CPU signature of a device.
- You can specify one of the following values for this macro:
||Intel® microarchitecture code name Skylake
||Intel® microarchitecture code name
Skylake on Intel Xeon® processor family
||Intel® microarchitecture code name
||Intel® microarchitecture code name Broadwell on Intel Xeon® processor family
||Intel® microarchitecture code name Haswell
||Intel® microarchitecture code name Haswell on Intel Xeon® processor family
||Intel® microarchitecture code name Ivy Bridge
||Intel® microarchitecture code name Ivy Bridge on Intel Xeon® processor family
||Intel® microarchitecture code name Sandy Bridge
||Intel® microarchitecture code name Sandy Bridge on Intel Xeon® processor family
||Intel® microarchitecture code name Westmere
||Intel® microarchitecture code name Westmere on Intel Xeon® processor family
- Improved heuristics for choosing local size when ndrange is enqueued to the
command queue that was created with
CL_QUEUE_THREAD_LOCAL_EXEC_ENABLE_INTEL property (extension
- A fix for a previous issue where an incorrect library was loaded when running on Intel®
microarchitecture code name Skylake.
- Fix for the known incompatibility issue with the CPU Kernel Debugger from the Intel® SDK for OpenCL™ Applications 2016 R2 and the CPU only runtime package version 16.1.
- Performance optimizations:
- Compiler vectorizer heuristic tuning for a set of workloads
- Workgroup fusion optimization improvements
- Performance enhancements of the vload()/vstore() built-in functions
- Fix for the issue reported on the forum (https://software.intel.com/en-us/comment/1844607#comment-1844607): vectorizer produces incorrect code on SSE42 architectures when using the samplerless read_imagef() built-in function with image2d_t and int2 coordinates as arguments.
- cl_khr_gl_sharing extension was disabled due to incompatibility with the Microsoft* Basic Display Adapter. To use this extension, please install OpenCL Driver for Intel® Iris™ Graphics and HD Graphics for Windows* OS from /content/www/us/en/develop/articles/opencl-drivers.html#iris. The driver package includes the OpenCL Runtime package for CPUs.
- Due to performance bug Threading Building Blocks (TBB) library was downgraded from 4.2,Interface version 7001, Oct 2 2013" to 4.2, Interface version 7005 , Jun 1 2014
- Support for Intel® Core™ 6th generation and Xeon® v4 processors (former Intel microarchitecture codename Broadwell)
- Support for OpenCL™ 2.0 specification
- Improved cross-CPU support of pre-compiled kernel binary in Runtime:
- Enables loading pre-generated kernel binaries that saves OpenCL program build time. For more information, see https://software.intel.com/en-us/node/540584
- Enables generating a JIT binary for target CPU model by the Intel® SDK for OpenCL™ - Offline Compiler. For more information, see https://software.intel.com/en-us/node/539388
- Bug and memory leak fixes.
- Compiler infrastructure was updated to LLVM version 3.6.2
- Removed support for the Intel® Xeon Phi™ coprocessors
- New performance-related environment variables:
CL_CONFIG_CPU_RT_LOOP_UNROLL_FACTOR for loop unrolling of loops with non-constant trip count (CPU only)
CL_CONFIG_USE_FAST_RELAXED_MATH for enabling computations with floating-point calculation optimizations (forcing
- Improved Microsoft Visual Studio* debugging of OpenCL kernels on CPU device
- Bug and memory leak fixes
- Several performance enhancements including better auto-vectorization and alias analysis of OpenCL kernels for CPU device.
- Added support for offline kernel compilation and kernel binary distribution on Intel® Xeon Phi™ coprocessors. With this release, on both Intel® Xeon Phi™ coprocessor and Intel CPU, the kernel binary is the final executable binary in contrast to the previous release, where the kernel binary on Intel Xeon Phi coprocessor was an intermediate code.
- Improved kernel invocation time on Intel Xeon Phi coprocessor device in case of batching kernel commands into in-order queues
- Optimized compiler vectorizer
- New feature - User logger for API tracing and debugging functional failures in OpenCL applications
- New environment variable
- SPIR is now conformant on Intel Xeon Phi coprocessor
- Bug fixes
- Support for OpenCL Standard Portable Intermediate Representation (SPIR) 1.2 consumption.
- Intel® Manycore Platform Software Stack (Intel® MPSS) 3.2 and 3.2.3 support.
NOTE: Using OpenCL Runtime 14.1 with MPSS 3.2.1 is not recommended, as this combination introduces stability issues.
- Performance improvements:
- Faster execution of code dominated by statically diverging dynamically uniform branches
- More efficient event traversing algorithm
- NO_DMA mode is default, which improves buffer creation speed (not a preview feature anymore)
- Improved device side memory pool control
- CPU only: Starting with this release, kernel binary is the very final machine code. This enables creating the kernel binary offline and distributing it with the application machine code binary. This also eliminates the compilation time at the end-use product (clCreateProgramWithBinary)
- Bug fixed (for Intel® Xeon Phi™ coprocessors only): Compilation crash when a struct is defined globally in the CL file.
- New performance-related environment variables on Intel Xeon Phi –see the user guide for details
- Added 32-bit version of the runtime for Windows OS.
- Added OpenCL CPU device support on Intel Core™ processors.