# Using Data Parallelism

The OpenCL™ standard basic data parallelism uses Single Program Multiple
Data (SPMD) technique. SPMD resembles fragment processing with pixel shaders
in the context of graphics.

According to the SPMD technique, a kernel executes concurrently on multiple
elements. Each element has its own data and its own program counter. If
elements are vectors of any kind (for example, four-way pixel values for
an RGBA image), consider using vector types.

Consider the following code example to understand how to convert a regular
C code to an OpenCL code:

void scalar_mul(int n, const float *a, const float *b, float *result) { int i; for (i = 0; i < n; ++i) result[i] = a[i] * b[i]; }

This function performs element-wise multiplication of two arrays,

a

and b

. Each element in result stores the product of the matching
elements from arrays a

and b

.Note the following:

- Theforloop statement consists of two parts:
- Range of operation (a single dimension containingnelements)
- Internal parallel operation.

- The basic operation is done on scalar variables (floatdata types).
- Loop iterations are independent.

The same function in OpenCL appears as follows:

__kernel void scalar_mul(__global const float *a, __global const float *b, __global float *result) { int i = get_global_id(0); result[i] = a[i] * b[i]; }

The kernel function performs the same basic element-wise multiplication
of two scalar variables. The index is provided by use of a built-in function
that gives the global ID, a unique number for each work-item within the
grid (

NDRange

).The code itself does not imply any parallelism. Only the combination
of the code with the execution over a global grid implements the parallelism
of the device.

This parallelization method abstracts the details of the underlying
hardware. You can write your code according to the native data types of
the algorithm. The implementation takes care of the actual mapping to
specific hardware.

You can bind your code to the underlying hardware. This method provides
maximum performance on a specific platform. However, performance on other
platforms might be less than optimal. See the Using
Vector Data Types section for more information (for CPU device only).