Threading Building Blocks Atomic Operations: Introduction

I've been curious about Threading Building Blocks atomic operations ever since I learned that they exist. In my 14 years of developing multithreaded applications on Unix and Windows systems, I never came across that kind of construct. Does something similar exist in traditional threading libraries, but I just didn't know about it because I was working at a much higher level as I threaded previously unthreaded applications?

James Reinders begins his discussion of atomic operations in his book "Intel Threading Building Blocks" as follows:

Atomic operations are a fast and relatively easy alternative to mutexes. They do not suffer from the deadlock and convoying problems [that are possible with mutexes]

How could such a thing be possible? That's what I wondered, having years of experience working with mutexes.

TBB atomic operation benefits and limitations

Well, the answer is a bit complicated. Atomic operations do appear to represent a "freebie" in terms of multithreaded software development, in that they are a simple means to guarantee thread safety for code that would normally not be threadsafe. But, there is a caveat: atomic operations are severely limited in terms of the situations where they can be applied:

The main limitation of atomic operations is that they are limited in current computer systems to fairly small data sizes: the largest is usually the size of the largest scalar, often a double-precision floating-point number.

In the applications I've worked on, such an operation is too fine-grained to be of success-defining import. I was typically able to accomplish sufficient multithreading of applications through finding an appropriate loop that could be parallelized to fully utilize the available processors.

My interest in Threading Building Blocks atomic operations was increased when I saw that the TBB-threaded version of Intel's Destroy the Castle uses TBB atomic operations in many places. Naturally, I wondered why.

The Threading Building Blocks Tutorial (available on the TBB Documentation page) tells us:

When a thread performs an atomic operation, the other threads see it as happening instantaneously. The advantage of atomic operations is that they are relatively quick compared to locks, and do not suffer from deadlock and convoying. ... you should not pass up an opportunity to use an atomic operation in place of mutual exclusion. ... A classic use of atomic operations is for thread-safe reference counting.

Fundamental operations on atomic<T> variables

I'll conclude this post with a list of the fundamental operations that can be applied to a varible x of type atomic<T> (from the TBB Tutorial):

= xread the value of x
x =write the value of x, and return it
x.fetch_and_store(y)do y=x and return the old value of x
x.fetch_and_add(y)do x+=y and return the old value of x
x.compare_and_swap(y,z)if x equals z, then do x=y. In either case, return old value of x

Because these operations happen atomically, they can be used safely without mutual exclusion.

At first glance, this set of operations can seem limiting. But it's fairly easy for me to think back on 14 years of coping with parallelizing code that was written for a single processor and come up with multiple situations where the existence of an atomic operation would likely have been beneficial, saving significant programming and debugging grief compared with traditional threading.

Naturally, I intend to investigate TBB's atomic operations further!

Kevin Farnham
O'Reilly Media
TBB Open Source Community

For more complete information about compiler optimizations, see our Optimization Notice.


robert-reed's picture

The simple rule for atomics is: what can you fit into a read-modify-write cycle? That's all you get with an atomic. Reference counting is a great use. Apply a lock prefix to a memory-destination add or subtract, and you guarantee proper sequencing to safely adjust the count. The same principal works with the classic P&amp;V operators

Atomics are not a panacea. If you're doing a reduction operation, it still pays to use local reduction variables and then use atomic operations just for merging the partial reductions at the end.

jseigh's picture

There's some stuff here

One problem is deciding what operations should be defined. This is complicated by differences between hardware architectures. And by what you imagine you will be needing them for. Where they don't provide the necessary atomic operations, you end up implementing your own anyway. At least that's what I did for my atomically thread-safe reference counting and for hazard pointers that didn't require a store/load memory barrier.

anonymous's picture

Locks and mutexs are built from atomic operations. You've been using them all along. OpenMP has #pragma omp atomic and Java has java.util.concurrent.atomic.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.