User Guide

Contents

Replace
Annotations with OpenMP* Code

This topic explains the steps needed to implement parallelism proposed by the
Intel Advisor
annotations by adding OpenMP* parallel framework code.
The recommended order for replacing the annotations with OpenMP code:
  1. Add appropriate synchronization of shared resources, using LOCK annotations as a guide.
  2. Test to verify you did not break anything, before adding the possibility of non-deterministic behavior with parallel tasks.
  3. Add code to create OpenMP parallel sections or equivalent, using the SITE/TASK annotations as a guide.
  4. Test with one thread to verify that your program still works correctly. For example, set the environment variable
    OMP_NUM_THREADS
    to 1 before you run your program.
  5. Test with more than one thread to see that the multithreading works as expected.
After you rewrite your code to use OpenMP* parallel framework, you can analyze its performance with
Intel® Advisor
perspectives. Use the
Vectorization and Code Insights
perspective to analyze how well you OpenMP code is vectorized or use the
Offload Modeling
perspective to model its performance on a GPU.
OpenMP creates worker threads automatically. In general, you should concern yourself only with the tasks, and leave it to the parallel frameworks to create and destroy the worker threads.
If you do need some control over creation and destruction of worker threads, see the compiler documentation. For example, to limit the number of threads, set the
OMP_THREAD_LIMIT
or the
OMP_NUM_THREADS
environment variable.
The table below shows the serial, annotated program code in the left column and the equivalent OpenMP C/C++ and Fortran parallel code in the right column for some typical code to which parallelism can be applied.
Serial C/C++ and Fortran Code with
Intel Advisor
Annotations
Parallel C/C++ and Fortran Code using OpenMP
// Synchronization, C/C++ ANNOTATE_LOCK_ACQUIRE(0); Body(); ANNOTATE_LOCK_RELEASE(0);
// Synchronization can use OpenMP // critical sections, atomic operations, locks, // and reduction operations (shown later)
! Synchronization, Fortran call annotate_lock_acquire(0) body call annotate_lock_release(0)
// Synchronization can use OpenMP // critical sections, atomic operations, locks, // and reduction operations (shown later)
// Parallelize data - one task within a // C/C++ counted loop ANNOTATE_SITE_BEGIN(site); for (i = lo; i < n; ++i) { ANNOTATE_ITERATION_TASK(task); statement; } ANNOTATE_SITE_END();
// Parallelize data - one task, C/C++ counted loops #pragma omp parallel for for (int i = lo; i < n; ++i) { statement; }
! Parallelize data - one task within a ! Fortran counted loop call annotate_site_begin("site1") do i = 1, N call annotate_iteration_task("task1") statement end do call annotate_site_end
! Parallelize data - one task with a ! Fortran counted loop !$omp parallel do do i = 1, N statement end do     !$omp end parallel do
// Parallelize C/C++ functions ANNOTATE_SITE_BEGIN(site); ANNOTATE_TASK_BEGIN(task1); function_1(); ANNOTATE_TASK_END(); ANNOTATE_TASK_BEGIN(task2); function_2(); ANNOTATE_TASK_END(); ANNOTATE_SITE_END();
// Parallelize C/C++ functions #pragma omp parallel //start parallel region { #pragma omp sections { #pragma omp section function_1(); #pragma omp section function_2(); } } // end parallel region
! Parallelize Fortran functions call annotate_site_begin("site1") call annotate_task_begin("task1") call subroutine_1 call annotate_task_end call annotate_task_begin("task2") call subroutine_2 call annotate_task_end call annotate_site_end
! Parallelize Fortran functions !$omp parallel ! start parallel region !$omp sections !$omp section call subroutine_1 !$omp section call subroutine_2 !$omp end sections !$omp end parallel ! end parallel region

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.