Compare and Swap Loop

Problem

Atomically update a scalar value so that a predicate is satisfied.

Context

Often a shared variable must be updated atomically, by a transform that maps its old value to a new value. The transform might be a transition of a finite state machine, or recording global knowledge. For instance, the shared variable might be recording the maximum value that any thread has seen so far.

Forces

  • The variable is read and updated by multiple threads.

  • The hardware implements "compare and swap" for a variable of that type.

  • Protecting the update with a mutex is to be avoided.

Related

  • Reduction
  • Reference Counting

Solution

The solution is to atomically snapshot the current value, and then use atomic<T>::compare_and_swap to update it. Retry until the compare_and_swap succeeds. In some cases it may be possible to exit before the compare_and_swap succeeds because the current value meets some condition.

The template below does the update x=f(x) atomically.

// Atomically perform x=f(x).
template<typename F, typename T>
void AtomicUpdate( atomic<T>& x, F f ) {
   int o;
   do {
       // Take a snapshot
       o = x;
       // Attempt to install new value computed from snapshot
   } while( x.compare_and_swap(f(o),o)!=o );
}

It is critical to take a snapshot and use it for intermediate calculations, because the value of X may be changed by other threads in the meantime.

The following code shows how the template might be used to maintain a global maximum of any value seen by RecordMax.

// Atomically perform UpperBound = max(UpperBound,y) 
void RecordMax( int y ) {
   extern atomic<int> UpperBound;
   AtomicUpdate(UpperBound, [&](int value){return std::max(value,y);} );
}

When y is not going to increase UpperBound, the call to AtomicUpdate will waste time doing the redundant operation compare_and_swap(o,o). In general, this kind of redundancy can be eliminated by making the loop in AtomicUpdate exit early if f(o)==o. In this particular case where F==std::max<int>, that test can be further simplified. The following custom version of RecordMax has the simplified test.

// Atomically perform UpperBound =max(UpperBound,y) 
void RecordMax( int y ) . .
   extern atomic<int> UpperBound;
   do {
       // Take a snapshot
       int o = UpperBound;
       // Quit if snapshot meets condition.
       if( o>=y ) break;
       // Attempt to install new value.
   } while( UpperBound.compare_and_swap(y,o)!=o );
}

Because all participating threads modify a common location, the performance of a compare and swap loop can be poor under high contention. Thus the applicability of more efficient patterns should be considered first. In particular:

  • If the overall purpose is a reduction, use the reduction pattern instead.

  • If the update is addition or subtraction, use atomic<T>::fetch_and_add. If the update is addition or subtraction by one, use atomic<T>::operater++ or atomic<T>::operator--. These methods typically employ direct hardware support that avoids a compare and swap loop.

CAUTION

When using compare_and_swap to update links in a linked structure, be sure you understand if the "ABA problem" is an issue. See the Internet for discourses on the subject.

For more complete information about compiler optimizations, see our Optimization Notice.