Developer Guide and Reference

Contents

OpenMP* Advanced Issues

This topic discusses how to use the OpenMP* library functions and environment variables and discusses some guidelines for enhancing performance with OpenMP*.
OpenMP* provides specific function calls, and environment variables. See the following topics to refresh you memory about the primary functions and environment variable used in this topic:
To use the function calls, include the
omp.h
header file
.
This file is
installed in the
INCLUDE
directory during the compiler installation, and compile the application using the
[Q]openmp
option.
The following example, which demonstrates how to use the OpenMP* functions to print the alphabet, also illustrates several important concepts:
  1. When using functions instead of
    pragmas,
    your code must be rewritten; rewrites can mean extra debugging, testing, and maintenance efforts.
  2. It becomes difficult to compile without OpenMP* support.
  3. it is very easy to introduce simple bugs, as in the loop (below) that fails to print all the letters of the alphabet when the number of threads is not a multiple of 26.
  4. You lose the ability to adjust loop scheduling without creating your own work-queue algorithm, which is a lot of extra effort. You are limited by your own scheduling, which is mostly likely static scheduling as shown in the example.
Example
#include <stdio.h> #include <omp.h> int main(void) { int i; omp_set_num_threads(4); #pragma omp parallel private(i) { // OMP_NUM_THREADS is not a multiple of 26, // which can be considered a bug in this code. int LettersPerThread = 26 / omp_get_num_threads(); int ThisThreadNum = omp_get_thread_num(); int StartLetter = 'a'+ThisThreadNum*LettersPerThread; int EndLetter = 'a'+ThisThreadNum*LettersPerThread+LettersPerThread; for (i=StartLetter; i<EndLetter; i++) { printf("%c", i); } } printf("\n"); return 0; }
Debugging threaded applications is a complex process because debuggers change the run-time performance, which can mask race conditions. Even
print
statements can mask issues, because they use synchronization and operating system functions. OpenMP* itself also adds some complications, because it introduces additional structure by distinguishing private variables and shared variables, and inserts additional code. A debugger that supports OpenMP* can help you to examine variables and step through threaded code. You can use Intel® Inspector to detect many hard-to-find threading errors analytically. Sometimes, a process of elimination can help identify problems without resorting to sophisticated debugging tools.
Remember that most mistakes are race conditions. Most race conditions are caused by shared variables that really should have been declared private. Start by looking at the variables inside the parallel regions and make sure that the variables are decl