Tutorial

  • 12/16/2019
  • Public Content

Building the OpenMP* Version

To build the OpenMP* version, you will modify the sample application to use OpenMP* parallelization and then compile the modified code. You will then run the application and compare the time with the baseline performance time.
  1. Remove all of the files that were created when you build the serial version by running the following command in a terminal session:
    %make clean
  2. Open the source file
    src/build_with_openmp/build_with_openmp.cpp
    in your favorite code editor.
  3. Do the following in the
    thread_for
    function:
    • Move the iteration-independent value of
      mboxsize
      out of the loop.
      • Exiting a loop in the middle of a parallelized loop is not permitted.
      • The iterations we save from this check will be distributed without affecting the result.
    • Remove the validity check of
      video->next_frame
      .
    • Add a
      #pragma omp parallel for
      to the outermost for loop to maximize the work done per thread.
    • Check against the complete change shown in
      tachyon.openmp_solution.cpp
      .
  4. Build the sample by running the following command in a terminal session:
    %make openmp
The makefile automatically runs the sample after it is built.
Compare the time to render the image to the baseline performance time.
If you wish to explicitly set the number of threads, you can set the environment variable
OMP_NUM_THREADS=N
where
N
is the number of threads. Alternatively, you can use the function
void omp_set_num_threads(int nthreads)
that is declared in
omp_lib.h
. Make sure to call this function before the parallel region is defined.
Options that use OpenMP* are available for both Intel and non-Intel microprocessors, but these options may perform additional optimizations on Intel® microprocessors than they perform on non-Intel microprocessors. The list of major, user-visible OpenMP* constructs and features that may perform differently on Intel versus non-Intel microprocessors includes:
  • Internal and user visible locks
  • The SINGLE construct
  • Explicit and implicit barriers
  • Parallel loop scheduling
  • Reductions
  • Memory allocation
  • Thread affinity and binding

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804