I have some event-based simulation code, with processes subscribing to events, and being triggered accordingly. The simulation proceeds in cycles, which means that if a process asks to subscribe to an event, or if an event asks to wake up a process, this is not need to be reflected in the current cycle. Each cycle processes are executed in parallel using TBB, than the simulator does some book-keeping serially.
This looks like a good case for thread-local storage: each thread will log all subscribe and trigger (wake-up) requests locally, and then the simulator will process the logs serially in its book-keeping phase.
Alternatively, non-blocking datastructures could be used to process the requests in parallel section "on-the-fly".
What alternative should I choose?
If I go for thread-local storage, how expensive is the overhead of enumerable_tread_specific container? That is, I could have one enumerable_tread_specific the_storage, where T would contain everything a thread needs, and when some object needs data it will make a long way to the_storage, or I could have small enumerable_tread_specific container in each such object that works with the data.