Why Parallel Inspector does not detect obvious data race

One of the main goals of Threading Errors analysis in Parallel Inspector is detecting data races which potentially lead to improper operation of applications or data corruption. Sometimes the code constructions that provoke data races are hidden in the complicated implementation of the program and Parallel Inspector is the best tool to help find such constructions.

However, there are some corner cases when an obvious problem in the implementation is not detected despite a sufficiently intrusive analysis. One such case is the parallel code sample below.

#include <omp.h>
int g_var;
void TestFunc(int par)
{
      printf("Thread# %d n", omp_get_thread_num());
      if (par == 0)
            g_var++;
      if (par != 0)
            g_var--;
}
 
int main(int argc, char* argv[])
{
      omp_set_num_threads(2);
 
#pragma omp parallel for
      for (int i=0; i<2; i++)
            TestFunc(i);
 
      printf("%d n", g_var);
return 0;
}



A few comments on the code:

- It's an OpenMP implementation of parallel execution of function TestFunc()

- TestFunc() is called once by each of the two threads

- Depending on input parameter one of the two branches is executed in the different threads incrementing or decrementing the global variable

- The global variable is not protected by any synchronization means

- The printf function call in the function  TestFunc() indicates that both threads are active.

 

The test case is compiled with Intel C++ Compiler v.10.1 and analyzed with Intel Parallel Inspector 1.0 Update 1 (build #66191) in the Treading Error level ti2. As it is shown on the screen shot with results of the analysis, the data race is not detected for this case. The level ti3 does not detect the error as well.


insp1.JPG

A persistent investigator would soon discover that Intel Thread Checker easily finds the error and reports the data race against the global variable. This confirms what we suspected, that there really is a data race here.

 

Intel Parallel Inspector offers various levels of intrusiveness in its analysis to balance time and resource use against the level of threading error detection. If ti4, the highest level is selected, this data race is detected as well.

insp2.JPG

Detection of this data race on levels ti2 and ti3 was skipped in order to improve analysis performance. One shortcut to significantly decrease overhead at these intermediate levels is to avoid checking for data races if memory is known to be shared among only two threads and only touched once by each. Even in that case there is a chance that Parallel Inspector will find a data race. It's dependent on the application. The corner cases like the sample above we believe are extremely rare in real applications. Slight modification of the code (setting more threads or adding more operations with the global variable) may change Parallel Inspector's ability to detect the data race on these intermediate levels.

 

As for performance improvement, a user might find Intel Parallel Inspector analyzing threading errors significantly faster on the intermediate levels by comparison to Intel Thread Checker.

For more complete information about compiler optimizations, see our Optimization Notice.