About This Document
This guide describes the optimization guidelines of OpenCL™ applications
targeting the Intel® Core™ and Intel® Xeon® Processors.
In case
your application targets Intel® processors with Intel® Graphics, refer
to the corresponding
OpenCL™ Developer
Guide for Intel® Processor Graphics.Note
Intel® Xeon Phi™ coprocessor based on the Intel® Many Integrated
Core (Intel® MIC) Architecture is supported only on OpenCL™ Runtime version 14.2.
This guide describes three basic factors most influence performance
on the multi-socket systems:
- Threading scalability. Multi-socket Intel Xeon systems combine many Intel® CPU cores, thus utilizing thread parallelism is critical to achieving good performance.
- Vectorization. Intel Xeon processors support wide vector registers and associated SIMD operations.
- Memory bandwidth utilization.
This guide explains, which sections of code consume most compute cycles,
and provides optimization best-known methods.
For better understanding of the optimizations described in this guide,
you must be familiar with the following concepts:
- The OpenCL standard
- Threading and Single Instruction Multiple Data (SIMD) vector instruction sets
See Also
OpenCL™ 1.2 Specification at
Overview Presentations of the OpenCL™ Standard at