Volatile: Almost Useless for Multi-Threaded Programming

There is a widespread notion that the keyword volatile is good for multi-threaded programming. I've seen interfaces with volatile qualifiers justified as "it might be used for multi-threaded programming". I thought was useful until the last few weeks, when it finally dawned on me (or if you prefer, got through my thick head) that volatile is almost useless for multi-threaded programming. I'll explain here why you should scrub most of it from your multi-threaded code.

Hans Boehm points out that there are only three portable uses for volatile. I'll summarize them here:

    • marking a local variable in the scope of a setjmp so that the variable does not rollback after a longjmp.

    • memory that is modified by an external agent or appears to be because of a screwy memory mapping

    • signal handler mischief



None of these mention multi-threading. Indeed, Boehm's paper points to a 1997 comp.programming.threads discussion where two experts said it bluntly:

"Declaring your variables volatile will have no useful effect, and will simply cause your code to run a *lot* slower when you turn on optimisation in your compiler." - Bryan O' Sullivan

"...the use of volatile accomplishes nothing but to prevent the compiler from making useful and desirable optimizations, providing no help whatsoever in making code "thread safe". " - David Butenhof


If you are multi-threading for the sake of speed, slowing down code is definitely not what you want. For multi-threaded programming, there two key issues that volatile is often mistakenly thought to address:

    1. atomicity

    1. memory consistency, i.e. the order of a thread's operations as seen by another thread.



Let's deal with (1) first. Volatile does not guarantee atomic reads or writes. For example, a volatile read or write of a 129-bit structure is not going to be atomic on most modern hardware. A volatile read or write of a 32-bit int is atomic on most modern hardware, but volatile has nothing to do with it. It would likely be atomic without the volatile. The atomicity is at the whim of the compiler. There's nothing in the C or C++ standards that says it has to be atomic.

Now consider issue (2). Sometimes programmers think of volatile as turning off optimization of volatile accesses. That's largely true in practice. But that's only the volatile accesses, not the non-volatile ones. Consider this fragment:

    volatile int Ready;       

  int Message[100];

  void foo( int i ) {

  Message[i/10] = 42;

  Ready = 1;

  }


It's trying to do something very reasonable in multi-threaded programming: write a message and then send it to another thread. The other thread will wait until Ready becomes non-zero and then read Message. Try compiling this with "gcc -O2 -S" using gcc 4.0, or icc. Both will do the store to Ready first, so it can be overlapped with the computation of i/10. The reordering is not a compiler bug. It's an aggressive optimizer doing its job.

You might think the solution is to mark all your memory references volatile. That's just plain silly. As the earlier quotes say, it will just slow down your code. Worst yet, it might not fix the problem. Even if the compiler does not reorder the references, the hardware might. In this example, x86 hardware will not reorder it. Neither will an Itanium™ processor, because Itanium compilers insert memory fences for volatile stores. That's a clever Itanium extension. But chips like Power™ will reorder. What you really need for ordering are memory fences, also called memory barriers. A memory fence prevents reordering of memory operations across the fence, or in some cases, prevents reordering in one direction. Paul McKenney's article Memory Ordering in Modern Microprocessors explains them. Sufficient for discussion here is that volatile has nothing to do with memory fences.

So what's the solution for multi-threaded programming? Use a library or language extension hat implements the atomic and fence semantics. When used as intended, the operations in the library will insert the right fences. Some examples:

    • POSIX threads

    • Windows™ threads

    • OpenMP

    • TBB



For example, the parallel reduction template in TBB does all the right fences so you don't have to worry about them.

I spent part of this week scrubbing volatile from the TBB task scheduler. We were using volatile for memory fences because version 1.0 targeted only x86 and Itanium. For Itanium, volatile did imply memory fences. And for x86, we were just using one compiler, and catering to it. All atomic operations were in the binary that we compiled. But now with the open source version, we have to pay heed to other compilers and other chips. So I scrubbed out volatile, replacing them with explicit load-with-acquire and store-with-release operations, or in some cases plain loads and stores. Those operations themselves are implemented using volatile, but that's largely for Itanium's sake.  Only one volatile remained, ironically on an unshared local variable! See file src/tbb/task.cpp in the latest download if your curious about the oddball survivor.
- Arch

For more complete information about compiler optimizations, see our Optimization Notice.

43 comments

Top
anonymous's picture

Regarding Alexandrescu's article: the discussion that Codeplug refers to seems to have missed the part of Alexandrescu's article that says "never use volatile with built-in types."

I'm kind of on the fence about this one--I can see why it might be good to banish volatile to the same dust heap of history where goto currently resides (along with C-style casts, std::auto_ptr, and maybe const, too, depending on who you listen to).

On the other hand, (mis)using volatile in the way Alexandrescu suggests is currently making my life easier by helping me identify shared objects in our codebase. We're introducing TBB-style task parallelism into a huge legacy game engine. By marking top-level shared objects "volatile" and following Alexandrescu's rules of "volatile correctness," I'm able to catch a good number of potential issues at compile time rather than seeing them at runtime (or finding them via grep, which was my other option). It's a hack, but I haven't been able to come up with anything that is less hacky and/or more useful.

Perhaps we could just treat the "volatile==atomic" myth the same way we treat the "const lets the compiler optimize more" myth or the. Neither keyword does anything magic, nor is either one capable of hinting the compiler in any particularly useful way. But they do serve as useful extensions to the type system that can help a reasonably disciplined, maintainable codebase stay disciplined and maintainable.

anonymous's picture

Bug Slayer: Your argument is nonsense. You might as well say this, "Assume a platform on which 'volatile' is necessary. On this platform, 'volatile' is necessary. Therefore the argument that volatile is not necessary is nonsense."

You have a "_WRITE_BARRIER_" in your code. You don't specify the semantics of this barrier. Is it something that interacts in some special way with 'volatile' or not? If it isn't, then it can blow up even with volatile. Though it can't be cached in a register, it can be cached elsewhere, say in the CPUs pre-fetch buffer or in the other CPU's write posting buffer.

""Variables that can be accessed across threads need to give the compiler a "hint" that they are different. The compiler assumes that variables will be used in a single thread, and optimizes accordingly. Short of turning of all optimization, the only way to address this is to tell the compiler that a variable may change unpredictably...that it is "volatile." No matter how thread-safe your code is otherwise, it won't make up for failure to use volatile where it counts. True, volatile is not a replacement for critical sections; but, neither are critical sections a substitute for using volatile when needed. They address totally different needs.""

This is totally and utterly false. YOU DO NOT NEED TO USE VOLATILE IF YOU USE CRITICAL SECTIONS. Period, end of story. So, no, the compiler does not need a hint. No, the compiler does not assume variables will be used in a single thread. No, volatile is not the only way to tell the compiler that a variable is accessed by other threads.

Answer this simple yes or no question: "If my code accesses all shared variables under the protection of a single mutex, do I still need to declare any shared variables volatile?" If you answer "yes", you're simply factually wrong. If you answer "no", then it refutes about 2/3 of your argument that you "need to give the compiler" a hint.

anonymous's picture

To claim that volatile is useless is naive at best. This claim is perhaps as silly as believing that volatile will fix all multithreading issues, which of course, it won't.

Consider the following pseudo-code:

THREAD 1:
struct MyObject{volatile A a; volatile B b;};
volatile MyObject o;
volatile bool bDone = false;
...Queue a request to thread 2, asking thread 2 to do something with o...
while (!bDone) {} // spin for a moment while thread 2 does its thing
o.a->DoSomething(); // thread 2 is done, use o.a
o.b->DoSomething(); // use o.b

THREAD 2:
o.a = new A(_thread_local_memory_of_some_sort);
o.b = new B(_non_threadsafe_variable);
_WRITE_BARRIER_
bDone = true;

As written, there is nothing wrong with the above code. (If you "can't like it" see the disclaimer at the end.) The code REQUIRES volatile though. Consider what will happen if volatile is removed:

THREAD 1 without volatile:
struct MyObject{A a; B b;};
MyObject o;
bool bDone = false;
...Queue a request to thread 2, asking thread 2 to do something with o...
while (!bDone) {} // !!!!! This may (usually will) spin forever because the compiler may a) reduce this to "while (!false)" or or b) cache bDone in a register (which won't get modified by THREAD 2.)
o.a->DoSomething(); // !!!!! This may (usually will) blow up because o.a may be cached, uninitialized, in a register
o.b->DoSomething(); // !!!!! This may (usually will) blow up because o.b may be cached, uninitialized, in a register

Volatile useless? I think not. Variables that can be accessed across threads need to give the compiler a "hint" that they are different. The compiler assumes that variables will be used in a single thread, and optimizes accordingly. Short of turning of all optimization, the only way to address this is to tell the compiler that a variable may change unpredictably...that it is "volatile." No matter how thread-safe your code is otherwise, it won't make up for failure to use volatile where it counts. True, volatile is not a replacement for critical sections; but, neither are critical sections a substitute for using volatile when needed. They address totally different needs.

DISCLAIMER: Obviously there are some things about this example that are not ideal. Some people don't like spinlocks, (though this would be a very good use for one) some people don't like queuing things to other threads (A.K.A. delegates, messages, or signals/slots), some people don't like thread local variables, and o.b should be wrapped with critical sections instead. Please ignore your personal taste, I ignored it so I could make the example 10 lines long rather than 1000.

anonymous's picture

There could be three possible reasons to use 'volatile':

1) If it was necessary, you would have to use it. But it's not. Things like mutexes do the job.

2) If it was sufficient, you could use it. But it's not. It doesn't provide atomicity or visible ordering from another thread. These are almost always exactly what you need.

That leaves only:

3) If you could combine it with something else so that the net result was sufficient and neither of those things alone are sufficient. There are a very, very limited number of examples where this is the case, but there are none of them in the context of POSIX threads. In all of these cases, 'volatile' is not used for its defined C/C++ semantics but for special semantics it has on that particular platform or as a generic qualifier just to ensure the correct version of an overloaded function is selected.

anonymous's picture

干洗机 上海保洁公司 干洗机 SEO 网站优化 服装搭配 上海保洁公司 干洗机 干洗机 卫星电视

anonymous's picture

> I think this article by Andrei Alexandrescu (a few years old) outlines a usage
> of volatile which seems to be quite useful for multithreaded programming:

> http://www.ddj.com/cpp/184403766

I disagree. More discussion here: http://groups.google.com/group/comp.programming.threads/browse_frm/thread/1fa4a82dda916b18/fd6be8f0b18bd62d?#fd6be8f0b18bd62d

gg

Arch D. Robison (Intel)'s picture

Right. In C# volatile also has the fence semantics. It's another example where Java/C# use tokens similar to C++, but have very different semantics.

anonymous's picture

It is good to note that the volatile semantics of Java are such that volatile acts as a memory barrier and prevents reordering. So if you program in Java, don't scrape volatile from your language yet... it's still a very handy keyword !

anonymous's picture

You are bang on about everything. Just to respond to:

"In my opinion reordering as described IS a compiler bug. It's just stupid
and dangerous so it's gcc fault. Compiler should NEVER do that with
volatile. Read&write of those should ALWAYS stay in place. Someone
must had a reason to set variable volatile. Preceding and following code
blocks could be reordered as much as want, but one should expect
at least that preceeding code WAS executed and following WAS NOT."

What you are saying is that the compiler should penalize legitimate users of 'volatile' (for the things it's documented to be safe for) so that you can abuse it.

Developers of threaded code have two choices:

1) They can extend 'volatile' so that it does what they want. People who use 'volatile' for the purposes suggested in the C standard will suffer a performance penalty. And since 'volatile' is only on or off, it will have to do everything (force ordering, force atomicity, force all types of visibility), so all code that uses it will be very slow.

2) That can let 'volatile' serve its intended purposes and add their own synchronization mechanisms that are finely-tuned to their specific requirements.

For reasons that should be obvious, '2' was selected. So 'volatile' does not force memory ordering because if it did, code that didn't need that would suffer a penalty for no reason. Instead, there are ways to force memory ordering where you need it, such as memory barriers.

anonymous's picture

I think this article by Andrei Alexandrescu (a few years old) outlines a usage of volatile which seems to be quite useful for multithreaded programming:

http://www.ddj.com/cpp/184403766

My summary would be: using volatile on built-in types is useless and dangerous, but using it on objects (and making use of C++ type checking) is very useful.

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.