Volatile: Almost Useless for Multi-Threaded Programming

There is a widespread notion that the keyword volatile is good for multi-threaded programming. I've seen interfaces with volatile qualifiers justified as "it might be used for multi-threaded programming". I thought was useful until the last few weeks, when it finally dawned on me (or if you prefer, got through my thick head) that volatile is almost useless for multi-threaded programming. I'll explain here why you should scrub most of it from your multi-threaded code.

Hans Boehm points out that there are only three portable uses for volatile. I'll summarize them here:

    • marking a local variable in the scope of a setjmp so that the variable does not rollback after a longjmp.

    • memory that is modified by an external agent or appears to be because of a screwy memory mapping

    • signal handler mischief



None of these mention multi-threading. Indeed, Boehm's paper points to a 1997 comp.programming.threads discussion where two experts said it bluntly:

"Declaring your variables volatile will have no useful effect, and will simply cause your code to run a *lot* slower when you turn on optimisation in your compiler." - Bryan O' Sullivan

"...the use of volatile accomplishes nothing but to prevent the compiler from making useful and desirable optimizations, providing no help whatsoever in making code "thread safe". " - David Butenhof


If you are multi-threading for the sake of speed, slowing down code is definitely not what you want. For multi-threaded programming, there two key issues that volatile is often mistakenly thought to address:

    1. atomicity

    1. memory consistency, i.e. the order of a thread's operations as seen by another thread.



Let's deal with (1) first. Volatile does not guarantee atomic reads or writes. For example, a volatile read or write of a 129-bit structure is not going to be atomic on most modern hardware. A volatile read or write of a 32-bit int is atomic on most modern hardware, but volatile has nothing to do with it. It would likely be atomic without the volatile. The atomicity is at the whim of the compiler. There's nothing in the C or C++ standards that says it has to be atomic.

Now consider issue (2). Sometimes programmers think of volatile as turning off optimization of volatile accesses. That's largely true in practice. But that's only the volatile accesses, not the non-volatile ones. Consider this fragment:

    volatile int Ready;       

  int Message[100];

  void foo( int i ) {

  Message[i/10] = 42;

  Ready = 1;

  }


It's trying to do something very reasonable in multi-threaded programming: write a message and then send it to another thread. The other thread will wait until Ready becomes non-zero and then read Message. Try compiling this with "gcc -O2 -S" using gcc 4.0, or icc. Both will do the store to Ready first, so it can be overlapped with the computation of i/10. The reordering is not a compiler bug. It's an aggressive optimizer doing its job.

You might think the solution is to mark all your memory references volatile. That's just plain silly. As the earlier quotes say, it will just slow down your code. Worst yet, it might not fix the problem. Even if the compiler does not reorder the references, the hardware might. In this example, x86 hardware will not reorder it. Neither will an Itanium™ processor, because Itanium compilers insert memory fences for volatile stores. That's a clever Itanium extension. But chips like Power™ will reorder. What you really need for ordering are memory fences, also called memory barriers. A memory fence prevents reordering of memory operations across the fence, or in some cases, prevents reordering in one direction. Paul McKenney's article Memory Ordering in Modern Microprocessors explains them. Sufficient for discussion here is that volatile has nothing to do with memory fences.

So what's the solution for multi-threaded programming? Use a library or language extension hat implements the atomic and fence semantics. When used as intended, the operations in the library will insert the right fences. Some examples:

    • POSIX threads

    • Windows™ threads

    • OpenMP

    • TBB



For example, the parallel reduction template in TBB does all the right fences so you don't have to worry about them.

I spent part of this week scrubbing volatile from the TBB task scheduler. We were using volatile for memory fences because version 1.0 targeted only x86 and Itanium. For Itanium, volatile did imply memory fences. And for x86, we were just using one compiler, and catering to it. All atomic operations were in the binary that we compiled. But now with the open source version, we have to pay heed to other compilers and other chips. So I scrubbed out volatile, replacing them with explicit load-with-acquire and store-with-release operations, or in some cases plain loads and stores. Those operations themselves are implemented using volatile, but that's largely for Itanium's sake.  Only one volatile remained, ironically on an unshared local variable! See file src/tbb/task.cpp in the latest download if your curious about the oddball survivor.
- Arch

For more complete information about compiler optimizations, see our Optimization Notice.

43 comments

Top
Arch D. Robison (Intel)'s picture

The committees indeed think very hard about this sort of thing. Adding fence semantics to volatile was considered by the C++ committee. See N2016 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html) for why adding fence behavior ("inter-thread visibility") to volatile was rejected.

Instead, C++ 200x has support for fencing via its atomic operations library. See Chapter 29 of the working draft (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2606.pdf). With that library, the example can be written correctly as:

std::atomic_bool Ready;
int Message[100];
void foo( int i ) {
Message[i/10] = 42;
Ready = 1;
}

Intel Threading Building Blocks (http://www.threadingbuildingblocks.org) has a class tbb::atomic<bool> that similarly makes the example work. Disclaimer: that is a shameless plug from TBB's architect :-)

anonymous's picture

&gt;void foo( int i ) {
&gt; Message[i/10] = 42;
&gt; Ready = 1;
&gt; }
&gt;The reordering is not a compiler bug.

In my opinion reordering as described IS a compiler bug. It's just stupid
and dangerous so it's gcc fault. Compiler should NEVER do that with
volatile. Read&amp;write of those should ALWAYS stay in place. Someone
must had a reason to set variable volatile. Preceding and following code
blocks could be reordered as much as want, but one should expect
at least that preceeding code WAS executed and following WAS NOT.
Any other aproach is just bad and for me only option is ignore fact that
some bunch of people was not thinking first when projecting (damn comities)
then another bunch when writing (compiler) and I am happy at least I can
just turn optimization off in functions accessing volatile, but that not deny
stupidity of compilers ignoring volatile qualifier and doing whatever want.
I'm not talking now about other things as hardware caches or memory issues,
but have no idea who gave anyone right to consider volatile's in ANY
kind of optimisations ?!? That kind of data definitely shoud be excluded
from it entirely and unconditionary.

Arch D. Robison (Intel)'s picture

Yes, the above should work given the assertions. At worst it can hoist the read of "abort" above job.doChunk(), which presumably does no damage in an example like this.

anonymous's picture

Well, how about a working thread repeatedly checking if the job has been cancelled?
assuming that bool read writes are atomic will the following c++ snippet work as expected?
<PRE>
class WorkThread
{
volatile bool abort;
public:
void run()
{
...
abort=false;
while(job.notFinished())
{
job.doChunk();
if(abort)
return;
}
...
}
void cancel()
{
abort=true;
}
};
</PRE>
I know I will still need a mutex/waitcondition or similar to synchronize threads, but it would be shocking to find out that the above code could be executed in some arbitrary order.

regehr's picture

Thanks for the comments!

Definitely structs and arrays would be great to test.

That DMR essay is great -- I wonder how "restrict" snuck back in? I heard of a great study (don't have a reference handy unfortunately) where someone profiled programs' memory behavior in order to add in a maximal amount of restrict qualifiers, then recompiled and got no speedup at all :)

I think it is not hard to argue against all uses of volatile. As you say, it's a poor choice for communication and synchronization between threads. Register accesses can be done through function calls to asm stubs. That doesn't seem to leave many uses...

Anyway thanks for the example about moving memory operations past volatile operations, a friend of mine who works at a major embedded systems company didn't believe that this would ever be done by a compiler until I pointed him to this blog post.

Arch D. Robison (Intel)'s picture

I liked the paper. Before I thought volatile was useless; now I'm scared of it :-)

What about checking volatile on fields and structures? A compiler could forget to propagate a volatile qualifier on struct to the fields inside. Likewise for a volatile array (as opposed to an array of volatile elements). The restrict keyword from C99 might offer further mischief. E.g., there may be transforms so focussed on restrict that they forget about volatile. Cast and inlining offers other possibilities for compiler error when changing the "to" and "from" types differ in volatile qualifiers.

One of the inventors of C (Dennis Ritchie) was against volatile (and const). See here (http://www.lysator.liu.se/c/dmr-on-noalias.html).

regehr's picture

Also here's an example (oddly, icc gets it right):

[regehr@babel ~]$ cat &gt; foo.c
volatile int x;
void foo (void)
{
x;
}
[regehr@babel ~]$ icpc -S foo.c
[regehr@babel ~]$ cat foo.s
# -- Machine type IA32
# mark_description "Intel(R) C++ Compiler for applications running on IA-32, Version 10.1 Build 20070913 %s";
# mark_description "-S";
.file "foo.c"
.text
..TXTST0:
# -- Begin _Z3foov
# mark_begin;
.align 2,0x90
.globl _Z3foov
_Z3foov:
..B1.1: # Preds ..B1.0
ret #5.1
.align 2,0x90
# LOE
# mark_end;
.type _Z3foov,@function
.size _Z3foov,.-_Z3foov
.data
# -- End _Z3foov
.bss
.align 4
.align 4
.globl x
x:
.type x,@object
.size x,4
.space 4 # pad
.data
.section .note.GNU-stack, ""
# End
[regehr@babel ~]$

regehr's picture

Arch-- just wanted to point you to some results that may indicate that volatile is even less useful than one would hope, since compilers tend to not properly respect it.

http://www.cs.utah.edu/~regehr/papers/emsoft08_submit.pdf

John Regehr

anonymous's picture

link "Hans Boehm points out that there are only three portable uses for volatile" has title but not href

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.