Data Race Identified on Atomic Construct (Win32 Interlocked… functions, #pragma omp atomic, or atomic template)

Problem : Intel® Parallel Inspector identifies a data race on an atomic construct. (Win32 Interlocked… functions, OpenMP* #pragma omp atomic, or Intel® Threading Building Blocks atomic<T> declared variables.)

Root Cause : For atomics, Intel Parallel Inspector checks to see if all accesses to a variable are atomic. If two accesses occur at the same time, and they are not both atomic, then it reports a data-race.

Resolution : Either place every access (reads and writes) to a variable in an atomic construct, or suppress the warning with the Intel Parallel Inspector suppresion feature.

More Information :
In Intel Threading Building Blocks, all operations (including reads) on objects of type atomic<T> are implicitly atomic.  For more Information see section 6.2 of the Intel TBB reference Manual: http://software.intel.com/sites/default/files/m/c/2/a/301114.pdf. If you see a data race reported by Intel® Parallel Inspector on such a variable you can safely suppress it in Intel Parallel Inspector.

Win32 Interlocked variable access states that it only guarantees atomicity in respect to other interlocked functions. Additionally, the variable must be aligned. For More Info see details for your specific Interlocked function, an example: http://msdn.microsoft.com/en-us/library/ms683614(VS.85).aspx

#pragma omp atomic is also only atomic in respect to other #pragma omp atomic operations: See Section 2.8.5 of OpenMP spec at http://software.intel.com/sites/default/files/m/e/1/c/spec30.pdf

That’s the theory, now for something more practical. If you declare a variable which is a basic data type (no larger than 32 bits on a IA-32 architecture machine and 64 bits on an Intel® 64 architecture machine), aligned, and declared volatile, most C/C++ compilers will use a single atomic machine instruction to read the variable, even when read outside an atomic construct. Section 7.1.1 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, System Programming Guide, Part 1 allows atomicity on reads of variables, and allows some additional atomic operations on unaligned data and larger data sizes. In these cases you can safely suppress the data race using Intel Parallel Inspector’s suppression feature.  But Note: Your code may not be portable to another set of compiler switches, compiler, or architecture.

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

Dmitry Vyukov's picture

Hmmm... How Inspector may detect a data race on a TBB atomic? Does Inspector allow false-positives? If so, then do they specific to TBB atomics or they related to the underlying method of verification?

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Dmitry Vyukov's picture

Regarding Win32 Interlocked, how Inspector treats loads of variables? There is no InterlockedRead(), so it's common practice to just load the variable.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Dmitry Vyukov's picture

Does Inspector detect races when one thread uses 32-bit Interlocked function on a memory location and another thread uses 64-bit Interlocked function on the same memory location (actual addresses may differ by 32-bits)?

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Eric W Moore (Intel)'s picture

Intel Parallel Inspector does not know if you used TBB, OpenMP, or Inerlocked... functions. It examines the binary, if the binary has one thread using a .LOCK# extension on the assembly instruction and one thread is using one of the many ways to read a variable ..... then it reports a data race.

This may or may not be a true data race, you will have to analyze the code/binary yourself to determine.

From a language/library perspective - there is no way to "read" a memory variable safely with Interlocked.. functions, or OpenMP.... but, you could do ...
CurrentAtomicVal=InterlockedAdd(&AtomicVar,0)
or
#pragma omp atomic
AtomicVal+=0;

But, "I don't think" that is common practice...(and I haven't even tried it). I think the common practice is to depend on the fact that the architecure and compiler - generally make reads atomic. This is an unsafe assumption - but that is what is done.

A better practice for OpenMP and Interlocked... is when updating your Atomic Variable - you assign the new value to a variable which is private to the thread. - and that is the value you read within your thread -not what is at the memory location pointed to by your original variable. (Note: Do not update/read the atomic variable into the private variable, update the private memory variable, and assing it back to the original - as that would be an atomicity violation)

"I think" the best solution is to use the Atomic<> construct provided by TBB - (or with something similiar). And I beleive the new constructs that are being worked into the upcoming C/C++ specs to support atomics will also be good.

Dmitry Vyukov's picture

Ok, thank you, so Inspector works on the binary level.

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Dmitry Vyukov's picture

Will not it give too many false-positives with such approach?..
Every "CAS loop" contains read of the variable... And many algorithms based on just atomic reads and writes...

Re: "I think" the best solution is to use the Atomic<> construct provided by TBB - (or with something similiar). And I beleive the new constructs that are being worked into the upcoming C/C++ specs to support atomics will also be good.

But std::atomic<>::load() will emit NON LOCKed load of the data, so Inspector will report a race, right?

All about lock-free algorithms, multicore, scalability, parallel computing and related topics: http://www.1024cores.net
Eric W Moore (Intel)'s picture

Intel Parallel Inspector is likely to report a data race when the user uses std::atomic::load() ... depending on how the compiler/library implements that function on IA32/Intel64. I believe that the engineers are still researching/confirming/debating the appropriate instructions to implement ::load() on IA32/Intel64.

From a Language/Library perspective using c++ std::atomic::load() - will eventually be safe. - and if you look at the assembly and it is using a single instruction to load the memory (i.e: the compiler isn't doing something illegal/bad) - then you can safely suppress the data-race warning in Intel Parallel Inspector using the suppresion feature.

As to the frequency of reporting false positives - I can not guess. But note: just because somebody is using atomics - does not mean they did it right (remember there are both architecture and compiler issues - just looking at the assembly for the Interlocked function or omp atomic statement is not enough - you need to make sure the compiler did the "right" thing for all the surrounding variables: do you require Serial consistency for example). Semantically, using c++00x or TBB atomics - it is easier to make sure that every read/write is safe. Semantically - you need to use Interlocked.. .functions in Win32 and atomics in OpenMP - else it won't be atomic.