Site and Task Annotations for Parallel Sites with Multiple Tasks

Parallel site annotations mark the beginning and end of the parallel site. Similarly, begin-end parallel task annotations mark the start and end of each task region. Use this begin-end task annotation pair if there are multiple tasks in a parallel site, if the task code does not include all of the loop body, or for complex loops or code that requires specific task begin-end boundaries, including multiple task end annotations.

Syntax: Parallel Sites with Multiple Tasks

Parallel site annotations that mark the parallel site:

C/C++:

ANNOTATE_SITE_BEGIN(sitename); and ANNOTATE_SITE_END();

Fortran:

call annotate_site_begin(sitename) and call annotate_site_end

C#:

Annotate.SiteBegin(sitename); and Annotate.SiteEnd();

Parallel task annotations that mark each task within the parallel site:

C/C++:

ANNOTATE_TASK_BEGIN(taskname); and ANNOTATE_TASK_END();

Fortran:

call annotate_task_begin(taskname) and call annotate_task_end

C#:

Annotate.TaskBegin(taskname); and Annotate.TaskEnd();

For the C/C++ ANNOTATE_TASK_END(); annotation, the taskname argument is optional.

The taskname must follow the rules for annotation name arguments:

  • For C/C++ code, the taskname must be an ASCII C++ identifier. This should be a name you will recognize when it appears in Intel Advisor tool reports.

  • For Fortran code, the taskname must be a character constant. This should be a name you will recognize when it appears in Intel Advisor tool reports.

  • For C# code, the taskname must be a string. This name should be a string that you will easily remember when it appears in Intel Advisor tool reports.

If you previously used site and task annotations for simple loops with one task and need to convert the task to this general, multiple task form, replace the single iteration loop annotation with a pair of task begin and task end annotations that mark the task region. Both forms use the same parallel site annotations.

Examples: Parallel Site, Multiple Tasks Not in a Loop

The stats C++ sample application shows task parallelism with multiple tasks that are in a parallel site but not in a loop. In this case, several related statements do a lot of computation work and each can be a separate task:

ANNOTATE_SITE_BEGIN(MySite1);
  cout << "Start calculating running average..."<<endl;
  ANNNOTATE_TASK_BEGIN(MyTask1); 
  runningAvg(vals, SIZE, rnAvg);
  ANNOTATE_TASK_END(MyTask1);
    
  cout << "Start calculating running standard deviation..."<<endl;
  ANNOTATE_TASK_BEGIN(MyTask2);
  runningStdDev(vals, SIZE, rnStdDev);
  ANNOTATE_TASK_END(MyTask2);
ANNOTATE_SITE_END(MySite1);

In addition to calling functions that perform the computations, there are other cases where the Survey tool may indicate that a single statement consumes a lot of CPU time. For example, a Fortran array assignment for a very large array.

Examples: Parallel Site, Multiple Tasks Within a Loop

The annotations in the following C/C++ code fragment specify that each iteration of the loop can be two separate tasks, potentially running in parallel with any other iteration and the other task.

 ...
 ANNOTATE_SITE_BEGIN(sitename);
 for (I=0; i<N; I++) {
    ANNOTATE_TASK_BEGIN(task1);
    func1(I);
    ANNOTATE_TASK_END();
    ANNOTATE_TASK_BEGIN(task2);
    func2(I);
    ANNOTATE_TASK_END();
 }
 ANNOTATE_SITE_END();
 ...

The following Fortran code fragment also shows the Fortran site and task annotations, where each iteration of the loop can be two separate tasks, potentially running in parallel with any other iteration and the other task.

 ...
 call annotate_site_begin("sitename ")
   do i=1,size 
      call annotate_task_begin("task1")
      call func1(i)
      call annotate_task_end
      call annotate_task_begin("task2")
      call func2(i)
      call annotate_task_end
   end do
 call annotate_site_end
 ...

The code for each task will be marked between task begin and task end annotation pairs inside a parallel site. Code that is not executed in any task is executed by the thread entering the site, which may run in parallel with the identified tasks. In this example, the loop control code that increments i and the compares i with N is assumed to be executed separately from the explicitly specified tasks. This means that you may see conflicts between tasks, and the code outside of any task.

When you use the Correctness tool on the above code, the tool would report data conflicts on global data accessed by either func1 or func2 on a later loop iteration.

The help topic Annotating Parallel Sites and Tasks describes inserting sites and tasks.

Parallel Site and Task Placement

Consider the following C/C++ code:

 ...
 ANNOTATE_SITE_BEGIN(sitename);
 for (i=0; i<N; i++) {
     ANNOTATE_ITERATION_TASK(taskname);
     func(i);
 }
 ANNOTATE_SITE_END();
 ...
 ...
  for (i=0; i<N; i++) {
     ANNOTATE_SITE_BEGIN(sitename);
     ANNOTATE_TASK_BEGIN(taskfunc1);
     func1(i);
     ANNOTATE_TASK_END();
     ANNOTATE_TASK_BEGIN(taskfunc2);
     func2(i);
     ANNOTATE_TASK_END();
     ANNOTATE_SITE_END();
  }
 ...

In the simple case on the left, the single annotated site encapsulates the entire loop. This causes all of the iterations of the loop to potentially run all at the same time. Use this simple form of loop annotations (two site annotations and one iteration task annotation) for loops whenever possible.

In the case on the right, you are not specifying that all of the loop iterations will run in parallel, but rather that the opportunities for parallelism are only within a single iteration of the loop. In this case, only the invocations of func1 and func2 from one loop iteration at a time are considered as sources of potential parallelism. So, in the case on the right, you will never see conflicts between successive invocations of func1, because you are specifying that you do not intend to run them in parallel.

Graphically comparing what the model considers to be in parallel for these two cases, with time progressing from left to right for each case:

Diagram of execution

The boxes shown overlapping vertically above are modeled as being executed in parallel.

The execution of ANNOTATE_TASK_BEGIN(taskname) and ANNOTATE_TASK_END() pair delimits the dynamic extent of a task. Each time the annotations are executed during Intel Advisor Correctness or Suitability analysis to collect interactions between tasks, a dynamic extent is identified that is associated with the most closely containing dynamic site. Each task is assumed to be independent and able to be run in parallel with all other tasks inside the containing sites.

Task annotations in a multiple-task parallel site must use the following rules:

  • According to execution paths, each begin task annotation must be terminated by an end task annotation.

  • Task boundaries must be within parallel site boundaries.

  • The argument to the task annotations follow the rules for annotation name arguments.

The only times tasks are not modeled to be executing in parallel are:

  1. When tasks are using synchronization, the specific code inside the synchronized region will not be modeled to be in parallel with other code synchronized using the same lock addresses.

  2. When one task creates another task, the code of the parent task executed before the second task is created is assumed to execute before the task creation. However, any code executed after the task creation is assumed to be in parallel with the nested task. For example:

  ...
  ANNOTATE_SITE_BEGIN(sitename);
  for (I=0; i<N; I++) {
      ANNOTATE_TASK_BEGIN(taskfunc1a);
      func1a(I);
      ANNOTATE_TASK_BEGIN(taskfunc1a);
      func2(I);
      ANNOTATE_TASK_END();
      func1b(I);
      ANNOTATE_TASK_END();
  }
  ANNOTATE_SITE_END();
  ...

In this example, func1a(I) is not in parallel with either func2(I) or func1b(I). However, func2(I) and func1b(I) are modeled as being executed in parallel. This semantic interpretation allows modeling of recursion where nested calls create tasks that execute in parallel. In this example, note that while this parallel relationship holds for tasks inside one iteration, tasks from different loop iterations will all be in parallel because they have no special relationship. For example, func1a(I) from one loop iteration may be executed concurrently with func2(I) in a different iteration.

While you are checking correctness, the Correctness tool assumes that all tasks in a given site may execute in parallel unless there is explicit synchronization. For example, in this case all N iterations of func1 and func2 will execute in parallel.

  ...
  ANNOTATE_SITE_BEGIN(sitename);
  for (I=0; i<N; I++) {
      ANNOTATE_TASK_BEGIN(taskfunc1);
      func1(I);
      ANNOTATE_TASK_END();
      ANNOTATE_TASK_BEGIN(taskfunc2);
      func2(I);
      ANNOTATE_TASK_END();
  }
  ANNOTATE_SITE_END();
  ...

If you want to model other kinds of relationships, for example func2 invocations will have some form of serialization, that constraint needs to be expressed using lock annotations that mark a lock that is acquired and released for the duration of that task's execution.

To select where to add task annotations may take some experimentation, considering factors such as average instance time and number of iterations (provided in the Suitability Report). If your parallel site has nested loops and the computation time used by the innermost loop is small, consider adding task annotations around the next outermost loop. See help topics such as How Big Should a Task Be?.

Einzelheiten zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.