This page provides the current Release Notes for Intel® CPU Runtime for OpenCL™ Applications for Intel® Core™ and Intel® Xeon® processors. This page covers the CPU (x86-64) OpenCL™ implementation only. See the OpenCL™ Runtimes for Intel® Processors article for additional Intel® Graphics Technology information. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.
Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.
You can copy a link to a specific version's section by clicking the chain icon next to its name.
All files are in PDF format - Adobe Reader* (or compatible) required.
For OpenCL™ developer tools, visit the Intel® SDK for OpenCL™ Applications 2019 page.
For questions or technical support, visit Intel® Software Developer Support.
NOTE: For Intel Xeon® Phi™ coprocessor device support, you must install Intel MPSS version 3.3 available here. (Deprecated)
- Support of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) ISA on Intel® Xeon® Platinum processor (formerly code name Skylake)
- Enabled features of OpenCL™ 2.1. The product is based on a published Khronos* Specification and has passed the Khronos Conformance Process. The conformance record can be found at https://www.khronos.org/conformance/adopters/conformant-products/opencl. Refer to submission #322 recorded on October 7, 2018.
- Support for vectorization width 16 for the environment and configuration file variable CL_CONFIG_CPU_VECTORIZER_MODE, as well as for OpenCL™ C optional kernel attribute intel_vec_len_hint
- Support for OpenCL™ Kernel debugging on Linux* OS with GDB*
- Improved coexistence support with Intel® Graphics Compute Runtime for OpenCL™ Driver when both are installed.
- Changed the platform name returned via clGetPlatformInfo(...) OpenCL™ API call with CL_PLATFORM_NAME bitflag to “Intel(R) CPU Runtime for OpenCL(TM) Applications”
- New environment variable CL_CONFIG_CPU_TARGET_ARCH. It generates code exclusively for a given target CPU architecture. Allows only lowering the instruction set level supported by CPU:
Allowed values are:
Generates code for processors that support Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Foundation instructions, Intel® AVX-512 Conflict Detection instructions, Intel® AVX-512 Doubleword and Quadword instructions, Intel® AVX-512 Byte and Word instructions and Intel® AVX-512 Vector Length Extensions for Intel® processors, and the instructions enabled with core-avx2.
Generates code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2 SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
Generates code for processors that support Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
Generates code for processors that support Intel® SSE4.2 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSE3, SSE2, SSE, and SSSE3 instructions.
- Fixed an issue with user functions not being inlined in programs created using clCreateProgramWithIL(...) OpenCL™ API call
- Fixed incorrectly reported CL_DEVICE_MAX_COMPUTE_UNITS for multi-socket Intel® Xeon® systems (reported on forum https://software.intel.com/en-us/forums/opencl/topic/702240)
- Fixed incompatibility with Intel® Threading Building Blocks (Intel® TBB) max_allowed_parallelism parameter
- Fixed an issue with CL_DRIVER_VERSION returning incorrect driver version
- Improved OpenCL™ C compiler diagnostics
- Minor bug fixes
- Updated the compiler infrastructure to LLVM* version 6.0
- Intel® CPU Runtime for OpenCL™ Applications 18.1 supports CPU only. For Intel® Xeon Phi™ coprocessor support, use the version 14.2. For more information, see OpenCL™ runtime entry and release notes on the OpenCL™ driver page at: https://software.intel.com/en-us/articles/opencl-drivers .
- New optional __attribute__((intel_vec_len_hint(<uint>)))
- This attribute can be used to provide a hint to the compiler that the kernel will perform best if vectorized to the specified vector length.
- You can specify one of the following lengths for this attribute:
uint Description 0 The compiler uses heuristics to decide whether to vectorize the kernel,
and if so, which vector length to use. This is the default behavior.
1 No vectorization is performed by the compiler. Explicit vector data types
in kernels are left intact.
4 Disables heuristics and vectorizes to the length of 4 respectively. 8 Disables heuristics and vectorizes to the length of 8 respectively.
- New OpenCL™ C predefined macro __INTEL_OPENCL_CPU_<CPUSIGN>
- This macro can be used to fine tune the kernel for a specific CPU device microarchitecture. <CPUSIGN> is the CPU signature of a device.
- You can specify one of the following values for this macro:
Macro Intel Microarchitectures __INTEL_OPENCL_CPU_SKL__ Intel® microarchitecture code name Skylake __INTEL_OPENCL_CPU_SKX__ Intel® microarchitecture code name
Skylake on Intel Xeon® processor family
__INTEL_OPENCL_CPU_BDW__ Intel® microarchitecture code name
__INTEL_OPENCL_CPU_BDW_XEON__ Intel® microarchitecture code name Broadwell on Intel Xeon® processor family __INTEL_OPENCL_CPU_HSW__ Intel® microarchitecture code name Haswell __INTEL_OPENCL_CPU_HSW_XEON__ Intel® microarchitecture code name Haswell on Intel Xeon® processor family __INTEL_OPENCL_CPU_IVB__ Intel® microarchitecture code name Ivy Bridge __INTEL_OPENCL_CPU_IVB_XEON__ Intel® microarchitecture code name Ivy Bridge on Intel Xeon® processor family __INTEL_OPENCL_CPU_SNB__ Intel® microarchitecture code name Sandy Bridge __INTEL_OPENCL_CPU_SNB_XEON__ Intel® microarchitecture code name Sandy Bridge on Intel Xeon® processor family __INTEL_OPENCL_CPU_WST__ Intel® microarchitecture code name Westmere __INTEL_OPENCL_CPU_WST_XEON__ Intel® microarchitecture code name Westmere on Intel Xeon® processor family __INTEL_OPENCL_CPU_UNKNOWN__ Unknown microarchitecture
- Improved heuristics for choosing local size when ndrange is enqueued to the
command queue that was created with
CL_QUEUE_THREAD_LOCAL_EXEC_ENABLE_INTEL property (extension
- A fix for a previous issue where an incorrect library was loaded when running on Intel®
microarchitecture code name Skylake.
- New optional __attribute__((intel_vec_len_hint(<uint>)))
- Fix for the known incompatibility issue with the CPU Kernel Debugger from the Intel® SDK for OpenCL™ Applications 2016 R2 and the CPU only runtime package version 16.1.
- Performance optimizations:
- Compiler vectorizer heuristic tuning for a set of workloads
- Workgroup fusion optimization improvements
- Performance enhancements of the vload()/vstore() built-in functions
- Fix for the issue reported on the forum (https://software.intel.com/en-us/comment/1844607#comment-1844607): vectorizer produces incorrect code on SSE42 architectures when using the samplerless read_imagef() built-in function with image2d_t and int2 coordinates as arguments.
- cl_khr_gl_sharing extension was disabled due to incompatibility with the Microsoft* Basic Display Adapter. To use this extension, please install OpenCL Driver for Intel® Iris™ Graphics and HD Graphics for Windows* OS from https://software.intel.com/en-us/articles/opencl-drivers#iris. The driver package includes the OpenCL Runtime package for CPUs.
- Due to performance bug Threading Building Blocks (TBB) library was downgraded from 4.2,Interface version 7001, Oct 2 2013" to 4.2, Interface version 7005 , Jun 1 2014
- Support for Intel® Core™ 6th generation and Xeon® v4 processors (former Intel microarchitecture codename Broadwell)
- Support for OpenCL™ 2.0 specification
- Improved cross-CPU support of pre-compiled kernel binary in Runtime:
- Enables loading pre-generated kernel binaries that saves OpenCL program build time. For more information, see https://software.intel.com/en-us/node/540584
- Enables generating a JIT binary for target CPU model by the Intel® SDK for OpenCL™ - Offline Compiler. For more information, see https://software.intel.com/en-us/node/539388
- Bug and memory leak fixes.
- Compiler infrastructure was updated to LLVM version 3.6.2
- Removed support for the Intel® Xeon Phi™ coprocessors
- New performance-related environment variables:
CL_CONFIG_CPU_RT_LOOP_UNROLL_FACTORfor loop unrolling of loops with non-constant trip count (CPU only)
CL_CONFIG_USE_FAST_RELAXED_MATHfor enabling computations with floating-point calculation optimizations (forcing
- Improved Microsoft Visual Studio* debugging of OpenCL kernels on CPU device
- Bug and memory leak fixes
- Several performance enhancements including better auto-vectorization and alias analysis of OpenCL kernels for CPU device.
- 14.2 (deprecated)
- Added support for offline kernel compilation and kernel binary distribution on Intel® Xeon Phi™ coprocessors. With this release, on both Intel® Xeon Phi™ coprocessor and Intel CPU, the kernel binary is the final executable binary in contrast to the previous release, where the kernel binary on Intel Xeon Phi coprocessor was an intermediate code.
- Improved kernel invocation time on Intel Xeon Phi coprocessor device in case of batching kernel commands into in-order queues
- Optimized compiler vectorizer
- New feature - User logger for API tracing and debugging functional failures in OpenCL applications
- New environment variable
- SPIR is now conformant on Intel Xeon Phi coprocessor
- Bug fixes
- 14.1 (deprecated)
- Support for OpenCL Standard Portable Intermediate Representation (SPIR) 1.2 consumption.
- Intel® Manycore Platform Software Stack (Intel® MPSS) 3.2 and 3.2.3 support.
NOTE: Using OpenCL Runtime 14.1 with MPSS 3.2.1 is not recommended, as this combination introduces stability issues.
- Performance improvements:
- Faster execution of code dominated by statically diverging dynamically uniform branches
- More efficient event traversing algorithm
- NO_DMA mode is default, which improves buffer creation speed (not a preview feature anymore)
- Improved device side memory pool control
- CPU only: Starting with this release, kernel binary is the very final machine code. This enables creating the kernel binary offline and distributing it with the application machine code binary. This also eliminates the compilation time at the end-use product (clCreateProgramWithBinary)
- Bug fixed (for Intel® Xeon Phi™ coprocessors only): Compilation crash when a struct is defined globally in the CL file.
- New performance-related environment variables on Intel Xeon Phi –see the user guide for details
- Added 32-bit version of the runtime for Windows OS.
- Added OpenCL CPU device support on Intel Core™ processors.