Support for std::atomic to enable data race detection

Support for std::atomic to enable data race detection

Hello,

We are trying to verify some of our programs that use TBB for data races using ThreadSanitizer data race detector (https://clang.llvm.org/docs/ThreadSanitizer.html). However, currently we are seeing lots of data race reports inside of TBB library itself which makes further verification impossible. See below for examples. The root problem is that TBB uses home-grown implementation of atomic operations which is not understood by ThreadSanitizer.

Have you considered implementing TBB's atomic on top of language/compiler-provided atomic operations? If not, I would like to make a feature request for this. This can probably help with testing of TBB itself as well.

On implementation level this is just a matter of mapping one interface onto another and can use either C++'s std::atomic, C's atomic operations or the new style __atomic compiler built-ins (https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html) (supported by both clang and gcc). Any of these will work for ThreadSanitizer. This probably should be a build-time option enabled when building at least for ThreadSanitizer.

Thanks

---

WARNING: ThreadSanitizer: data race (pid=720823)
Read of size 8 at 0x7b30000093b0 by thread T10:
#0 load_with_acquire tbb/include/tbb/tbb_machine.h:613:23
#1 __TBB_load_with_acquire<long> tbb/include/tbb/tbb_machine.h:708
#2 operator tbb::internal::rml::private_worker * tbb/include/tbb/atomic.h:301
#3 propagate_chain_reaction tbb/src/tbb/private_server.cpp:161
#5 tbb::internal::rml::private_worker::thread_routine(void*) tbb/src/tbb/private_server.cpp:228:11

Previous write of size 8 at 0x7b30000093b0 by thread T11:
#0 store_with_release tbb/include/tbb/tbb_machine.h:619:18
#1 __TBB_store_with_release<long, long> tbb/include/tbb/tbb_machine.h:712
#2 store_with_release tbb/include/tbb/atomic.h:328
#3 operator= tbb/include/tbb/atomic.h:492
#4 tbb::internal::rml::private_server::wake_some(int) tbb/src/tbb/private_server.cpp:391
#5 propagate_chain_reaction tbb/src/tbb/private_server.cpp:162:13
#6 tbb::internal::rml::private_worker::run() tbb/src/tbb/private_server.cpp:266
#7 tbb::internal::rml::private_worker::thread_routine(void*) tbb/src/tbb/private_server.cpp:228:11

WARNING: ThreadSanitizer: data race (pid=57663)
Read of size 8 at 0x7b7000000c08 by thread T5:
#0 tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::receive_or_steal_task(long&) tbb/src/tbb/custom_scheduler.h:240:35
#1 tbb::internal::arena::process(tbb::internal::generic_scheduler&) tbb/src/tbb/arena.cpp:154:21
#2 tbb::internal::market::process(rml::job&) tbb/src/tbb/market.cpp:677:16
#3 tbb::internal::rml::private_worker::run() tbb/src/tbb/private_server.cpp:275:23
#4 tbb::internal::rml::private_worker::thread_routine(void*) tbb/src/tbb/private_server.cpp:228:11

Previous write of size 8 at 0x7b7000000c08 by main thread:
#0 tbb::internal::machine_load_store_relaxed<tbb::task**, 8ul>::store(tbb::task** volatile&, tbb::task**) tbb/include/tbb/internal/../tbb_machine.h:687:18
#1 void tbb::internal::__TBB_store_relaxed<tbb::task**, tbb::task**>(tbb::task** volatile&, tbb::task**) tbb/include/tbb/internal/../tbb_machine.h:738
#2 tbb::internal::generic_scheduler::leave_task_pool()
#3 tbb::internal::generic_scheduler::cleanup_master(bool) tbb/src/tbb/scheduler.cpp:1224
#4 tbb::internal::governor::terminate_scheduler(tbb::internal::generic_scheduler*, tbb::task_scheduler_init const*) tbb/src/tbb/governor.cpp:229:12
#5 tbb::task_scheduler_init::terminate() tbb/src/tbb/governor.cpp:385
#6 tbb::task_scheduler_init::~task_scheduler_init() tbb/include/tbb/task_scheduler_init.h:124:13

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

ping

Hi Dmitry,

Sorry for the late response. You can enable compiler specific builtins with TBB_USE_GCC_BUILTINS or TBB_USE_ICC_BUILTINS define macros. Try this and tell what you got, because something that you need may be missed.

Thanks,

Nikita

Great! Thanks, will try.

Number of races greatly reduced with TBB_USE_GCC_BUILTINS, however we still see few like:

WARNING: ThreadSanitizer: data race (pid=608931)
  Read of size 8 at 0x7b3000009430 by thread T11:
    #0 load_with_acquire tbb_machine.h:613:23
...
  Previous write of size 8 at 0x7b3000009430 by thread T10:
    #0 store_with_release tbb_machine.h:619:18

 

 

It seems we are hitting this case in machine/gcc_generic.h:

// TODO: implement with __atomic_* builtins where available
#define __TBB_USE_GENERIC_HALF_FENCED_LOAD_STORE            1
#define __TBB_USE_GENERIC_RELAXED_LOAD_STORE                1
#define __TBB_USE_GENERIC_SEQUENTIAL_CONSISTENCY_LOAD_STORE 1

 

We are also using clang, and it seems the header file only checks gcc version when selecting between old and new atomics. My clang 5.0 pretends that it is a very old gcc, so it probably gets the old builtins which can't express relaxed stores, etc. I think we also need something along these lines:

#ifndef __has_builtin
#  define __has_builtin(x) 0
#endif

#if __TBB_GCC_VERSION < XXXXX || __has_builtin(__atomic_load_n)
  ... use new builtins ...
#endif

 

 

 

 

To make it clear, we only care about the case when compiler support the new atomics and the new atomics can directly express relaxed/acquire/release store/loads. Compilers that don't have new atomics, don't support ThreadSanitizer either (and are just too old to bother).

Thanks Dmitry, we will take a look. And if you know some commonly used macro that can be checked for presence of new atomic built-ins, I will appreciate if you share with us. The __has_builtin() is an option, but maybe there is some feature macro available.

I don't know what's the right __has_feature value for this. After searching internet I see that some code just checks __has_builtin for all atomic function, e.g.:

https://code.woboq.org/llvm/libcxx/src/include/atomic_support.h.html

and some code checks __has_feature(c_atomic), e.g.:

https://gist.github.com/galek/68480cb04b432443cbad

However, I think strictly saying __has_feature(c_atomic) tells about presence of __c11_atomic_* functions, which is a parallel set of functions with the different prefix.

I would say __has_builtin(__atomic_load_n) should work well in practice.

Leave a Comment

Please sign in to add a comment. Not a member? Join today