Express Data Parallelism (with OpenMP* and Intel® Compilers) without large scale modifications to serial code


Express data parallelism in a non-invasive manner and produce threaded executables without large-scale modifications to the serial code. Intel® Compilers can automatically parallelize some loops by means of the -Qparallel option. In this scenario, however, the compilers err on the side of caution when they cannot guarantee correct parallel execution. This limits the degree to which this method can parallelize code.

It is also possible to express data parallelism with explicit threading techniques like Pthreads, but it is an invasive process. The computation must be separated into a function that can be mapped to threads, within which the work must be manually divided among the threads. Explicit synchronization must also be added to guarantee correct results. Such major code modifications can add unacceptably to development time and cost.


Describe parallelism to the compiler usingOpenMP*, a directive-based syntax. The -Qopenmp option tells the Intel® Compilers to process OpenMP directives or pragmas to produce a threaded executable. In the case of a compiler that does not understand OpenMP, the OpenMP directives are ignored, and the code is compiled without error. This is a key advantage of OpenMP over other parallel programming methods: it is incremental and relatively non-invasive. OpenMP can parallelize specific loops or regions of the program without large-scale code modifications, and the original serial code is left largely intact.

OpenMP is an explicitly fork/join method (i.e., the programmer must specify the start and end of a parallel region). An example is shown in the following code sample:

    #include <stdio.h>

    #define INTERVALS 100000

    int main ()


    int i;

    float n_1, x, pi = 0.0;

    n_1 = 1.0 / INTERVALS;

    #pragma omp parallel for private(x) reduction(+:pi)

    for (i = 0; i < INTERVALS; i++)


    x = n_1 * (float(i) - 0.5);

    pi += 4.0 / (1.0 + x * x);


    pi *= n_1;

    printf ("Pi = %fn", pi);



This code calculates the value of pi by numerical integration. The parallelism can be expressed by a single OpenMP pragma, leaving the underlying serial code intact.


Threading Methodology: Principles and Practice


For more complete information about compiler optimizations, see our Optimization Notice.