Data and Task Parallelism
This topic describes two fundamental types of program
execution - data parallelism and task parallelism - and the task patterns of
each.
Data Parallelism
In many programs, most of the work is done
processing items in a collection of data, often in a loop. The data parallelism
pattern is designed for this situation. The idea is to process each data item
or a subset of the data items in separate task instances. In general, the
parallel site contains the code that invokes the processing of each data item,
and the processing is done in a task.
In the most common version of this pattern, the
serial program has a loop that iterates over the data items, and the loop body
processes each item in turn. The data parallelism pattern makes the whole loop
a parallel site, and the loop body is a task. Consider this C/C++ simple loop:
ANNOTATE_SITE_BEGIN(sitename); for (int I = 0; I != n; ++I) { ANNOTATE_ITERATION_TASK(task_process); process(a[i]); } ANNOTATE_SITE_END();
The following C/C++ code shows a situation where
the data items to be processed are in the nodes of a tree. The recursive tree
walk is part of the serial execution of the parallel site - only the
process_node
calls are executed in separate tasks.
ANNOTATE_SITE_BEGIN(sitename); process_subtree(root); ANNOTATE_SITE_END(sitename); . . . void process_subtree(node) // in the dynamic extent of the parallel site { ANNOTATE_TASK_BEGIN(task_process); process_node(node); ANNOTATE_TASK_END(); for (child = first_child(node); child; child = next_child(child) ) { process_subtree(child); } }
In the data parallelism pattern, the parallel site
usually contains a single task.
The sample
tachyon_Advisor
demonstrates data parallelism.
Task Parallelism
When work is divided into several activities which
you cannot parallelize individually, you may be able to take advantage of the
task parallelism pattern.
The word
in
is used in the
general sense of an activity or job. It is just a coincidence that we use the
same word to refer to "a body of code that is executed independently of other
bodies of code".
task
task parallelism
In this pattern, you have multiple distinct task
bodies in a parallel site performing different activities at the same time.
Suppose that neither the display nor the update
operation from the previous example can be parallelized individually. You still
might be able to do the display and the update simultaneously. Consider this
C/C++ code:
initialize(data); while (!done) { old_data = data; ANNOTATE_SITE_BEGIN(sitename); ANNOTATE_TASK_BEGIN(task_display); display_on_screen(old_data); ANNOTATE_TASK_END(); ANNOTATE_TASK_BEGIN(task_updatedata); update(data); ANNOTATE_TASK_END(); ANNOTATE_SITE_END(); }
The most obvious shortcoming of the task-parallel
pattern is that it cannot take advantage of more cores than the number of
distinct tasks. In this example, any more than two cores would be wasted. On
the other hand, the task parallel pattern may be applicable to programs that
simply do not fit the data parallel pattern - some parallelism may be better
than none.
The tasks used in task parallelism are not limited
to called functions. For example, consider this C/C++ code that creates two
tasks that separately increment variables
X
and
Y
:
main() { ANNOTATE_SITE_BEGIN(sitename); ANNOTATE_TASK_BEGIN(task_x); X++; ANNOTATE_TASK_END(); ANNOTATE_TASK_BEGIN(task_y); Y++; ANNOTATE_TASK_END(); ANNOTATE_SITE_END(); }
The sample
stats
demonstrates task parallelism.