Direct support for double-check worth the trouble?

I suspect blogs are like poetry - more are written than read. I've had this specific posting lost twice now by the system, so I'll need a total of three readers to break even.

I'm the lead developer for Intel® Threading Building Blocks (Intel® TBB). I've been pondering whether TBB should have more direct support for the double-check pattern. Scott Meyer's article http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004.pdf explains why it is hard to get right. In TBB, the correct memory fences for it can be implied with the atomic<T> template, by writing it like this:

 // At file scope tbb::atomic<int> resource_is_ready; tbb::mutex initialization_mutex;

 // At function scope if( !resource_is_ready ) { tbb::mutex::scoped_lock lock(initialization_mutex); if( !resource_is_ready ) { ...initialize resource... resource_is_ready = true; } }

We could add more direct support, but I'm undecided on whether it is worth the trouble. If we add more direct support, the above fragment might look something like:

 // At file scope tbb::one_time my_is_ready; // At function scope TBB_ONE_TIME( my_is_ready ) { initialize resource my_is_ready.mark_done(); }


where TBB_ONE_TIME would be a macro that expands to something like "if( T _tbb_internal_var(ready) )", where T is a type internal to TBB. The reason for the temporary variable in the "if" would be to provide exception safety. If the destructor for _tbb_internal_var were called before the call to my_is_ready.mark_done(), then the implementation would know that an exception had been thrown and the resource was not successfully initialized.

While the more direct support would save some typing, it might add to the learning curve for TBB. So is the direct support worth the trouble? Is there a better way to provide the direct support? Or does the first pattern that I showed suffice?

- Arch

P.S. While writing the earlier draft that was lost, I discovered that we accidentally omitted atomic<bool> from TBB. That's something we plan to fix.

For more complete information about compiler optimizations, see our Optimization Notice.

3 comments

Top
Arch D. Robison (Intel)'s picture

The current specification of the memory model is Chapter 7 of Volume 3A http://www.intel.com/products/processor/manuals/index.htm is vague and confusing. It takes the hardware designer's viewpoint of what happens before instructions are retired, and indeed a lot of speculation is permitted before instructions retire. It's vague on whether reads are permitted to retire out of order.

It happens that on all current x86 implementations (including AMD's as far as I know), ordinary reads complete in order. Hence double check happens to work without an lfence. Whether this behavior should be canonized is an active topic of discussion. There's a general suspicion that if a processor really did reorder reads, enough software would break to make the processor unpopular. This is particularly important for IA-32, because its big selling point is preserving software investment. I remind myself of that every time I start thinking about how archeological the x86 instruction set looks.

So in principle an lfence is required between the read of the flag and use of the resource. But in "common law" practice, the lfence can be omitted.

anonymous's picture

Certainly providing the pattern is a good idea. Most people get it wrong when they code it themselves.

One thing I've always wondered is why branch prediction/speculative execution do not foil DLC on multi-processor systems.

Thread A on CPU1:
Fetch resource_is_ready
Speculate resource_is_ready = true
Fetch lazy_variable value
Fetch of lazy_variable completes
Makes use of lazy variable

Next, Thread B on CPU2:
Sets lazy_variable value
Sets resource_is_ready

Back to Thread A on CPU1:
Fetch of resource_is_ready completes (reads can complete out-of-order?)
Resource_is_ready == true --> speculation was correct!
[But lazy_variable was already read and used]

I guess my question is how does the lock of the mutex within the if statement serve as a barrier to reordering of reads of the lazy_variable to before the resource_is_ready check, if the lock instruction is never executed due to branch prediction?

anonymous's picture

Arch,

I don't think direct support is warrented. I think Double Lock Checking is wonderful, and I'd address it in library scope two possible ways:
* Add an example implementation so that users may copy and use.
* Add a Template class DoubleCheckedLockSingleton<T>, which implements all the syntatic magic, so that a developer just uses it and gets it right. I'd implement this template class using a concerte class with opaque pointers, thus the template code providing only type safety inlines.

Joseph

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.