Intel® C++ Compiler 19.0 Developer Guide and Reference
Intel® Cilk™ Plus is a deprecated feature. Use OpenMP* or Intel® Threading Building Blocks instead. For more information see Migrate Your Application to use OpenMP* or Intel® TBB Instead of Intel® Cilk™ Plus.
Intel® Cilk™ Plus is a deprecated feature. Use OpenMP* or Intel® Threading Building Blocks instead. For more information see Migrate Your Application to use OpenMP* or Intel® TBB Instead of Intel® Cilk™ Plus.
Races are a major cause of bugs in parallel programs.
A determinacy race occurs when two parallel strands access the same memory location and at least one strand performs a write operation. The program result depends on which strand "wins the race" and accesses the memory first.
A data race is a special case of a determinacy race. A data race is a race condition that occurs when two parallel strands, holding no locks in common, access the same memory location and at least one strand performs a write operation. The program result depends on which strand "wins the race" and accesses the memory first.
If the parallel accesses are protected by locks, there is no data race. However, a program using locks may not produce deterministic results. A lock can ensure consistency by protecting a data structure from being visible in an intermediate state during an update, but does not guarantee deterministic results.
For example, consider the following program:
int a = 2; // declare a variable that is // visible to more than one worker void Strand1() { a = 1; } int Strand2() { return a; } void Strand3() { a = 2; } int main() { int result; cilk_spawn Strand1(); result = cilk_spawn Strand2(); cilk_spawn Strand3(); cilk_sync; std::cout << "a = " << a << ", result = " << result << std:endl; }
Because Strand1(), Strand2() and Strand3() may run in parallel, the final value of a and result can vary, depending on the order in which they run.
Strand1() may write the value of a before or after Strand2() reads a, so there is a read/write race between Strand1() and Strand2().
Strand3() may write the value of a before or after Strand1() writes a, so there is a write/write race between Strand3() and Strand1().
Some data races are benign. In other words, although there is a race, the race does not affect the output of the program.
Here is a simple example:
bool bFlag = false; cilk_for (int i=0; i<N; ++i) { if (some_condition()) bFlag = true; } if (bFlag) do_something();
This program has a write/write race on the bFlag variable. However, all of the write operations are writing the same value (true) and the value is not read until after the cilk_sync that is implicit at the end of the cilk_for loop.
In this example, the data race is benign. No matter what order the loop iterations execute, the program produces the same result.
Intel® Cilk™ Plus is a deprecated feature. Use OpenMP* or Intel® Threading Building Blocks instead. For more information see Migrate Your Application to use OpenMP* or Intel® TBB Instead of Intel® Cilk™ Plus.
There are a number of ways to resolve a race condition:
Fix a bug in your program
Use local variables instead of global variables
Restructure your code
Change your algorithm
Use reducers
Use a lock
Fix a bug in your program
A race condition can be caused by a bug in the program logic. For instance, a race can occur if a recursive sort calls use an overlapping region, and, therefore, references the same memory location in parallel. The solution is to fix the application.
Use local variables instead of global variables
Consider the following program:
#include <cilk/cilk.h> #include <iostream> const int IMAX=5; const int JMAX=5; int a[IMAX * JMAX]; int main() { int idx; cilk_for (int i=0; i<IMAX; ++i) { for (int j=0; j<JMAX; ++j) { idx = i*JMAX + j; // This is a race. a[idx] = i+j; } } for (int i=0; i<IMAX*JMAX; ++i) std::cout << i << " " << a[i] <<std::endl; return 0; }
This program has a race on the idx variable, because it is accessed in parallel in the cilk_for loop. Because idx is only used inside the loop, it is easy to resolve the race by making idx local within the loop:
int main() { // int idx; // Remove global cilk_for (int i=0; i<IMAX; ++i) { for (int j=0; j<JMAX; ++j) { int idx = i*JMAX + j; // Declare idx locally a[idx] = i+j; } } ... }
Restructure your code
In some cases, you can eliminate the race by a simple rewrite. Here is another way to resolve the race in the previous program:
int main() { // int idx; // Remove global cilk_for (int i=0; i<IMAX; ++i) { for (int j=0; j<JMAX; ++j) { // idx = i*JMAX + j; // Don't use idx a[i*JMAX + j] = i+j; // Instead, // calculate as needed } } ... }
Change your algorithm
One of the best solutions, although not always easy or even possible, is to find an algorithm that partitions your problem such that the parallelism is restricted to calculations that cannot race.
Use reducers
Reducers are designed to be race-free objects that can be safely used in parallel. See Reducers for more information.
Use locks
Locks can be used to resolve data race conditions. The drawbacks to this approach are described in Introductions to Using Locks. There are several kinds of locks, including:
The following program contains a race on sum, because the statement sum=sum+i both reads and writes sum:
#include <cilk/cilk.h> int main() { int sum = 0; cilk_for (int i=0; i<10; ++i) { sum = sum + i; // THERE IS A RACE ON SUM } }
Using a lock to resolve the race:
#include <cilk/cilk.h> #include <tbb/mutex.h> #include <iostream> int main() { tbb::mutex mut; int sum = 0; cilk_for (int i=0; i<10; ++i) { mut.lock(); sum = sum + i; // PROTECTED WITH LOCK mut.unlock(); } std::cout << "Sum is " << sum << std::endl; return 0; }
This example is for illustration only. A reducer is typically a better way to solve this kind of race.