I was wondering how the internals of TBB work and how I can optimize the storage duration of my code.
Assuming that I'm coding in C++11 and I'm using a lot of parallel_* constructs like parallel_for or parallel_for_each, how TBB handles the same object accessed by multiple threads and how it avoids cache misses, it's a good idea to make thread_local anything I'll pass through TBB functions ?
for example imagine that I have a function foo() that gives me a pseudo-random int everytime it's called, inside foo there are 2 main objects like the mersenne standard implementation std::mt19937 and the std::uniform_int_distribution, It's good to use thread_local here ? It's better to use something else ? How about a data race or cache misses in TBB ?