• 2019 Update 4
  • 03/20/2019
  • Public Content
Contents

Using Data Parallelism

The OpenCL™ standard basic data parallelism uses the Single Program Multiple Data (SPMD) technique. SPMD resembles fragment processing with pixel shaders in the context of graphics.
In this programming model, a kernel executes concurrently on multiple elements. Each element has its own data and its own program counter. If elements are vectors of any kind (for example, four-way pixel values for an RGBA image), consider using vector types.
This section describes how to convert regular C code to an OpenCL program using a simple "hello world" example. Consider the following C function:
void scalar_mul(int n, const float *a, const float *b, float *result) { int i; for (i = 0; i < n; ++i) result[i] = a[i] * b[i]; }
This function performs element-wise multiplication of two arrays,
a
and
b
. Each element in result stores the product of the corresponding elements from arrays
a
and
b
.
Consider the following:
  • The
    for
    loop consists of two parts: the loop statement that defines the range of operation (a single dimension containing
    n
    elements), and the loop body itself.
  • The basic operation is done on scalar variables (
    float
    data types).
  • Loop iterations are independent.
The same function in OpenCL appears as follows:
__kernel void scalar_mul(__global const float *a, __global const float *b, __global float *result) { int i = get_global_id(0); result[i] = a[i] * b[i]; }
The kernel function performs the same basic element-wise multiplication of two scalar variables. The index is provided by use of a built-in function that gives the global ID, a unique number for each work-item within the grid defined by the
NDRange
.
The code itself does not imply any parallelism. Only the combination of the code with the execution over a global grid implements the parallelism of the device.
This parallelization method abstracts the details of the underlying hardware. You can write your code according to the native data types of the algorithm. The implementation takes care of the mapping to specific hardware.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804