Using Partially Parallel Programs with Intel Advisor Tools

Intel Advisor tools are designed to collect data and analyze serial programs. If you have a partially parallel program, before you use the Intel Advisor Suitability and Correctness tools to examine it to add more parallelism, read the guidelines in this topic and modify your program so it runs as a serial program with a single thread within each parallel site.

Run Your Program as a Serial Program

To run the current version of your program as a serial program, you need to limit the number of threads to 1. To run your program with a single thread:

  • With Intel® TBB, in the main thread create a tbb::task_scheduler_init init(1); object for the lifetime of the program and run the executable again. For example:

       int main() {
         tbb::task_scheduler_init init(1);
         // ...rest of program...
    
         return 0;
       }
    

    The effect of task_scheduler_init applies separately to each user-created thread. So if the program creates threads elsewhere, you need to create a tbb::task_scheduler_init init(1); for that thread's lifetime as well. Use of certain Intel TBB features can prevent the program from running serially. For more information, see the Intel TBB documentation.

  • With Intel® Cilk™ Plus, you can do one of the following:

    • Set the environment variable CILK_NWORKERS to 1 and run the executable again.

    • When using the Intel® C++ Compiler, set the compiler option -cilk-serialize (Linux* OS) or /Qcilk-serialize (Windows* OS) when building the target executable.

    • Before your program's first call to a function that performs a spawn (cilk_spawn or cilk_for), execute the cilkrts_set_param() function and specify nworkers as 1. For example:

    if (0!= cilkrts_set_param("nworkers","1"))
     {
        printf("Failed to set worker count\n");
        return 1;
     }
    

    Using cilkrts_set_param() overrides the value (if any) set by the CILK_NWORKERS environment variable. For more information, see the Intel Cilk Plus help.

  • With OpenMP*, do one of the following:

    • Set the OpenMP* environment variable OMP_NUM_THREADS to 1 before you run the program.

    • Omit the compiler option that enables recognition of OpenMP pragmas and directives. On Windows* OS, omit /Qopenmp, and on Linux* OS omit -openmp.

For more information, see your compiler documentation.

Add or Remove Intel Advisor Annotations

Intel Advisor site, task, and lock annotations are used by the Suitability and Correctness tools. You can insert Intel Advisor parallel site and task annotations to mark the already parallel code regions. For example, the nqueens_Advisor sample nqueens_cilk.cpp:

...
 ANNOTATE_SITE_BEGIN(solve);
  cilk_for(int i=0; i<size; i++) {
  // try all positions in first row using separate array for each recursion
  ANNOTATE_ITERATION_TASK(setQueen);
    int * queens = new int[size]; 
    setQueen(queens, 0, i);
  }
 ANNOTATE_SITE_END();

If needed, you can comment out annotations, or add preprocessor directives by using conditional compilation. For example, use the #ifdef, #ifndef, and #endif preprocessor directives:

...
// Comment out the next line to hide the annotations. 
#define ANNOTATE_ON   
.
.
.
#ifdef ANNOTATE_ON
  ANNOTATE_SITE_BEGIN(solve);
#endif
#ifndef ANNOTATE_ON
// insert parallel code here
.
.
.
#ifdef ANNOTATE_ON
  ANNOTATE_SITE_END();
#endif
... 

After you add the parallel framework code and test it, you can remove the annotations.

Effect of Parallel Code on Intel Advisor Tools' Reports

Because Intel Advisor tools are designed to collect data and analyze serial program targets.

Parallel code that creates one or more threads within any annotated parallel site usually cause the Suitability or Correctness tool reports to contain unreliable data. To use these two tools, there must be only a single thread within each parallel site. Also, when using parallel frameworks that use dynamic scheduling or work stealing at run-time, execution times can be assigned to the wrong source code.

If you use the Survey tool to profile your program, the Self Time in the Survey Report shows the sum of the CPU time for all threads. However, because Intel Advisor's purpose is to analyze serial code, some of the time used by parallel code may be added to the wrong places. For example, Self Time may be added to the parallel framework run-time system entry points instead of the caller(s) in the thread that entered the parallel region. Also in the Survey Report, when examining parallel code, some entry points may be parallel framework run-time system entry points instead of the expected functions or loops. Similarly, in the Survey Source window, for a parallel code region the Total Time (and Loop Time) shows the sum of the CPU time for all threads.

Because Intel Advisor's purpose is to analyze serial code, in the Suitability Report:

  • Intel Advisor assumes there is only a single thread (no parallelism) within any annotated parallel site, including its task(s) and lock(s). When only a single thread executes within a parallel site (as expected), the results for that site may be correct. If the application has multiple parallel sites, and one or more sites were executed by multiple threads, the next two items apply.

  • If multiple threads execute within any parallel site, the reported Maximum Program Gain and that site's Maximum Total Gain values are not reliable. To obtain correct values, ensure that only a single thread executes for all parallel sites (see Run Your Program as a Serial Program above).

  • If multiple threads execute within a parallel site, the results for that site will be unpredictable and its values will not be reliable. Also, if one thread executes the parallel site annotations and a second thread executes the task annotation(s), the site may appear to not have any tasks and the tasks may appear to not execute within a site. To obtain correct values, ensure that only a single thread executes within each parallel site (see Run Your Program as a Serial Program above).

  • Any work-stealing constructs within the site will cause extra time to be added to the suspended site and/or task. All Suitability Report times are approximate.

Similarly in the Correctness Report, if any parallel site uses multiple threads, this may prevent certain problems from being detected and reported by the Correctness tool. To obtain correct values, ensure that only a single thread executes within each parallel site (see Run Your Program as a Serial Program above).

For more complete information about compiler optimizations, see our Optimization Notice.