English | 中文 | Русский | Français
2,595 Posts served
8,341 Conversations started
There is a widespread notion that the keyword volatile is good for multi-threaded programming. I've seen interfaces with volatile qualifiers justified as "it might be used for multi-threaded programming". I thought was useful until the last few weeks, when it finally dawned on me (or if you prefer, got through my thick head) that volatile is almost useless for multi-threaded programming. I'll explain here why you should scrub most of it from your multi-threaded code.
Hans Boehm points out that there are only three portable uses for volatile. I'll summarize them here:
None of these mention multi-threading. Indeed, Boehm's paper points to a 1997 comp.programming.threads discussion where two experts said it bluntly:
"Declaring your variables volatile will have no useful effect, and will simply cause your code to run a *lot* slower when you turn on optimisation in your compiler." - Bryan O' Sullivan
"...the use of volatile accomplishes nothing but to prevent the compiler from making useful and desirable optimizations, providing no help whatsoever in making code "thread safe". " - David Butenhof
If you are multi-threading for the sake of speed, slowing down code is definitely not what you want. For multi-threaded programming, there two key issues that volatile is often mistakenly thought to address:
Let's deal with (1) first. Volatile does not guarantee atomic reads or writes. For example, a volatile read or write of a 129-bit structure is not going to be atomic on most modern hardware. A volatile read or write of a 32-bit int is atomic on most modern hardware, but volatile has nothing to do with it. It would likely be atomic without the volatile. The atomicity is at the whim of the compiler. There's nothing in the C or C++ standards that says it has to be atomic.
Now consider issue (2). Sometimes programmers think of volatile as turning off optimization of volatile accesses. That's largely true in practice. But that's only the volatile accesses, not the non-volatile ones. Consider this fragment:
volatile int Ready;
int Message[100];
void foo( int i ) {
Message[i/10] = 42;
Ready = 1;
}
It's trying to do something very reasonable in multi-threaded programming: write a message and then send it to another thread. The other thread will wait until Ready becomes non-zero and then read Message. Try compiling this with "gcc -O2 -S" using gcc 4.0, or icc. Both will do the store to Ready first, so it can be overlapped with the computation of i/10. The reordering is not a compiler bug. It's an aggressive optimizer doing its job.
You might think the solution is to mark all your memory references volatile. That's just plain silly. As the earlier quotes say, it will just slow down your code. Worst yet, it might not fix the problem. Even if the compiler does not reorder the references, the hardware might. In this example, x86 hardware will not reorder it. Neither will an Itanium(TM) processor, because Itanium compilers insert memory fences for volatile stores. That's a clever Itanium extension. But chips like Power(TM) will reorder. What you really need for ordering are memory fences, also called memory barriers. A memory fence prevents reordering of memory operations across the fence, or in some cases, prevents reordering in one direction. Paul McKenney's article Memory Ordering in Modern Microprocessors explains them. Sufficient for discussion here is that volatile has nothing to do with memory fences.
So what's the solution for multi-threaded programming? Use a library or language extension hat implements the atomic and fence semantics. When used as intended, the operations in the library will insert the right fences. Some examples:
For example, the parallel reduction template in TBB does all the right fences so you don't have to worry about them.
I spent part of this week scrubbing volatile from the TBB task scheduler. We were using volatile for memory fences because version 1.0 targeted only x86 and Itanium. For Itanium, volatile did imply memory fences. And for x86, we were just using one compiler, and catering to it. All atomic operations were in the binary that we compiled. But now with the open source version, we have to pay heed to other compilers and other chips. So I scrubbed out volatile, replacing them with explicit load-with-acquire and store-with-release operations, or in some cases plain loads and stores. Those operations themselves are implemented using volatile, but that's largely for Itanium's sake. Only one volatile remained, ironically on an unshared local variable! See file src/tbb/task.cpp in the latest download if your curious about the oddball survivor.
- Arch
| February 8, 2008 3:46 AM PST
Tom |
There are a couple of misunderstandings in your article. While it is true that using volatile alone and expecting this to make anything thread-safe is a naive (and wrong assumption), it is neither true that volatile is useless, or nearly so. What volatile does is, it prevents the compiler to cache a value in a register and do optimisations that remove operations in which the variable is involved. This is an important property, which is necessary in multi-threaded applications. Atomicity in loads or stores is not related to the C/C++ standard, but is a hardware feature (as long as the addressed units are no larger than register size). However, again, you miss the point here. It does not matter whether or not these are atomic. What matters is that load-modify-store and compare-exchange functions can be made atomic by using the proper instructions (via intrinsics, assembly, or kernel functions). This is what is needed to properly synchronize data between threads. If you can't be sure that the compiler won't optimize out a variable, or hold it in a register, or performs any other smart stuff, then this doesn't work. While it is true that volatile variables are a lot slower to access, even more so if atomic instructions are used (up to 10-15 times slower), the statement that this is "definitively not what you want with threads" shows that you really haven't understood. It is, in fact, EXACTLY what you want. What you don't want to happen is one thread on one core/cpu increment a counter while you use the now invalid value in another thread on another cpu. What you don't want to happen is one thread freeing (or simply changing) memory that you are still accessing in another (currently waiting) thread. What you don't want to happen is two threads taking the same head element from a job queue at the same moment, performing the same work twice and finally calling delete on the same pointer twice. All these issues can of course be safely synchronized by locking/unlocking before every access. However, THIS is what you don't want for the sake of performance. Constructs using volatile variables in combination with atomic instructions (read up on "lockfree programming") offer a much better solution, especially in highly congested scenarios. It is a lot better to burn two dozen CPU cycles using atomic instructions on a volatile than having every access synchronized by two syscalls (lock/unlock) which will chew up several hundred to thousand cycles each. |
| February 11, 2008 8:29 AM PST
Arch Robison (Intel) |
What matters for multithreading is: <ol> <li>Atomicity</li> <li>Visibility of memory operations</li> <li>The order in which memory operations become visible.</li> </ol> Volatile in C and C++ flunks on all three counts. All volatile does is prevent a compiler from caching a variable, which is orthogonal to both points above. It just slows programs down. I'll go into points (1) and (2) in detail. My original post already flogged (3). As I'll show, (2) is enough to put a stake through volatile for portable multi-threaded programming. As Tom notes, atomicity requires using the proper atomic instructions. Volatile does not address this. E.g., even if I declare x as volatile, x+=1 is not going to be compiled as an atomic increment. So special instructions outside the scope of the C/C++ standards must be used to access/modify x atomically. But that implies that <em>every</em> atomic access to x is outside the scope of the C/C++ standards. As far as the C/C++ standards are concerned, all the compiler sees is the address of x (or reference to x) being passed to routines outside the ken of the compiler. So declaring x as volatile is pointless; the compiler cannot cache loads/stores to x because in principle it does not see the loads/stores to x. Of course for specific compiler implementations we might know that a volatile load or store of a certain size is always compiled as an atomic operation. That's how TBB implements its internal __TBB_load_with_acquire and __TBB_store_with_release operations for some platforms. But we declare only the formal parameter as pointer-to-volatile and do not declare the actual variable as volatile, because what we are doing is platform specific. Indeed, if we were squeaky-clean about it, we would not even declare the formal parameter as pointer-to-volatile, but hide this platform-specific detail completely by casting the formal parameter to pointer-to-volatile. The portable portions of code should not declare any variables as volatile. They should call __TBB_load_with_acquire and __TBB_store_with_release to do the atomic loads and stores. Now consider point (2), where the ISO C/C++ volatile is downright counterproductive. (Except on Itanium, because of an Intel-specific interpretation of volatile.) Consider the job queue example. Let's assume the queue holds either zero or one elements. Such a queue can be implemented as a shared pointer R in memory that is either NULL or points to the queue's element. I'm simplifying the queue to make the core issue more obvious. With serious queue implementations the same issues strike with a vengance. <ol> <li>Thread 1 multiplies computes a matrix product M and atomically sets R to point to M.</li> <li>Thread 2 waits until R!=NULL and then uses M as a factor to compute another matrix product.</li> </ol> In other words, M is a message and R is a ready flag. Let's consider sequentially consistent machines. Never mind that memory fence issues that are critical to real machines and not addressed by volatile. I'll show that volatile is both a performance killer <em>and</em> useless for a sequentially consistent machine. Here's the key excerpt in the C99 standard: <blockquote>The least requirements on a conforming implementation are: <ul> <li>At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred.</li> </ul> </blockquote> Note that only volatile objects are required to be "stable". Volatile accesses have no effect on non-volatile accesses. Thus to rely on volatile in the example requires declaring <em>both</em> R and M as volatile. Declaring M as volatile slows down operations on it significantly. In general, much of multithreaded programming relies on a notion of privatization, where one thread operates on an object (such as M) and then passes it off to another thread. At any point in time, the object is being accessed only by one thread. But if we are going to depend upon volatile to get the hand-off right, we have to declare the object as volatile. That inflicts a heavy penalty on access to the object. In principle, we just have to ensure that when handing off an object, that every location associated with the object was last written with a volatile store before the hand off, and that every first read of a location by the receiving thread is done with a volatile load after the hand-off. Keeping track of that information would definitely be a pain. But it gets worse. Consider a thread handing off a std::map object to another thread. There's no way for a thread to even know all the locations inside the implementation std::map and mark them all volatile. I suppose a program could serialize the object into a volatile buffer, hand off the buffer, and reconstruct it from the buffer. That essentially inflicts all the pain of message-passing onto shared memory programming. Shared memory programming has enough pain. What we really want are memory fences that force the std::map to be written to memory by the sending thread before the receiving thread reads it. We do not want to turn off the optimizer, but merely enforce the order of some writes and reads. To summarize, multi-threaded programming is about atomicity and very precise control of the order in which memory operations become visible. Volatile does not address atomicity. Marking an object as volatile turns off caching of the object's value, which is a terribly imprecise and inefficient way to achieve the desired order of visibility, because all locations passed between threads would have to be marked volatile. To get the correct order efficiently requires some notion of memory fencing, which is outside the current C/C++ standards, but will be in future versions of those standards. |
| March 25, 2008 2:29 AM PDT
kappa | link "Hans Boehm points out that there are only three portable uses for volatile" has title but not href |
| March 25, 2008 7:52 AM PDT
Arch Robison (Intel) | Thanks for pointing out the missing linke. It's now repaired. |
| April 18, 2008 9:54 AM PDT
regehr |
Arch-- just wanted to point you to some results that may indicate that volatile is even less useful than one would hope, since compilers tend to not properly respect it. http://www.cs.utah.edu/~regehr/papers/emsoft08_submit.pdf John Regehr |
| April 18, 2008 11:57 AM PDT
regehr |
Also here's an example (oddly, icc gets it right): [regehr@babel ~]$ cat > foo.c volatile int x; void foo (void) { x; } [regehr@babel ~]$ icpc -S foo.c [regehr@babel ~]$ cat foo.s # -- Machine type IA32 # mark_description "Intel(R) C++ Compiler for applications running on IA-32, Version 10.1 Build 20070913 %s"; # mark_description "-S"; .file "foo.c" .text ..TXTST0: # -- Begin _Z3foov # mark_begin; .align 2,0x90 .globl _Z3foov _Z3foov: ..B1.1: # Preds ..B1.0 ret #5.1 .align 2,0x90 # LOE # mark_end; .type _Z3foov,@function .size _Z3foov,.-_Z3foov .data # -- End _Z3foov .bss .align 4 .align 4 .globl x x: .type x,@object .size x,4 .space 4 # pad .data .section .note.GNU-stack, "" # End [regehr@babel ~]$ |
| April 18, 2008 12:48 PM PDT
Arch Robison (Intel) |
I liked the paper. Before I thought volatile was useless; now I'm scared of it :-) What about checking volatile on fields and structures? A compiler could forget to propagate a volatile qualifier on struct to the fields inside. Likewise for a volatile array (as opposed to an array of volatile elements). The restrict keyword from C99 might offer further mischief. E.g., there may be transforms so focussed on restrict that they forget about volatile. Cast and inlining offers other possibilities for compiler error when changing the "to" and "from" types differ in volatile qualifiers. One of the inventors of C (Dennis Ritchie) was against volatile (and const). See here (http://www.lysator.liu.se/c/dmr-on-noalias.html). |
| April 18, 2008 4:11 PM PDT
regehr |
Thanks for the comments! Definitely structs and arrays would be great to test. That DMR essay is great -- I wonder how "restrict" snuck back in? I heard of a great study (don't have a reference handy unfortunately) where someone profiled programs' memory behavior in order to add in a maximal amount of restrict qualifiers, then recompiled and got no speedup at all :) I think it is not hard to argue against all uses of volatile. As you say, it's a poor choice for communication and synchronization between threads. Register accesses can be done through function calls to asm stubs. That doesn't seem to leave many uses... Anyway thanks for the example about moving memory operations past volatile operations, a friend of mine who works at a major embedded systems company didn't believe that this would ever be done by a compiler until I pointed him to this blog post. |
| May 20, 2008 10:02 AM PDT
Spud |
Well, how about a working thread repeatedly checking if the job has been cancelled? assuming that bool read writes are atomic will the following c++ snippet work as expected? <PRE> class WorkThread { volatile bool abort; public: void run() { ... abort=false; while(job.notFinished()) { job.doChunk(); if(abort) return; } ... } void cancel() { abort=true; } }; </PRE> I know I will still need a mutex/waitcondition or similar to synchronize threads, but it would be shocking to find out that the above code could be executed in some arbitrary order. |
| May 20, 2008 9:30 PM PDT
Arch Robison (Intel) | Yes, the above should work given the assertions. At worst it can hoist the read of "abort" above job.doChunk(), which presumably does no damage in an example like this. |
| May 30, 2008 11:26 PM PDT
Nervousone |
>void foo( int i ) { > Message[i/10] = 42; > Ready = 1; > } >The reordering is not a compiler bug. In my opinion reordering as described IS a compiler bug. It's just stupid and dangerous so it's gcc fault. Compiler should NEVER do that with volatile. Read&write of those should ALWAYS stay in place. Someone must had a reason to set variable volatile. Preceding and following code blocks could be reordered as much as want, but one should expect at least that preceeding code WAS executed and following WAS NOT. Any other aproach is just bad and for me only option is ignore fact that some bunch of people was not thinking first when projecting (damn comities) then another bunch when writing (compiler) and I am happy at least I can just turn optimization off in functions accessing volatile, but that not deny stupidity of compilers ignoring volatile qualifier and doing whatever want. I'm not talking now about other things as hardware caches or memory issues, but have no idea who gave anyone right to consider volatile's in ANY kind of optimisations ?!? That kind of data definitely shoud be excluded from it entirely and unconditionary. |
| June 2, 2008 12:06 PM PDT
Arch Robison (Intel) |
The committees indeed think very hard about this sort of thing. Adding fence semantics to volatile was considered by the C++ committee. See N2016 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html) for why adding fence behavior ("inter-thread visibility") to volatile was rejected. Instead, C++ 200x has support for fencing via its atomic operations library. See Chapter 29 of the working draft (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2606.pdf). With that library, the example can be written correctly as: std::atomic_bool Ready; int Message[100]; void foo( int i ) { Message[i/10] = 42; Ready = 1; } Intel Threading Building Blocks (http://www.threadingbuildingblocks.org) has a class tbb::atomic<bool> that similarly makes the example work. Disclaimer: that is a shameless plug from TBB's architect :-) |
| July 7, 2008 2:30 AM PDT
Ondrej Spanel |
I think this article by Andrei Alexandrescu (a few years old) outlines a usage of volatile which seems to be quite useful for multithreaded programming: http://www.ddj.com/cpp/184403766 My summary would be: using volatile on built-in types is useless and dangerous, but using it on objects (and making use of C++ type checking) is very useful. |
| August 9, 2008 10:30 PM PDT
David Schwartz |
You are bang on about everything. Just to respond to: "In my opinion reordering as described IS a compiler bug. It's just stupid and dangerous so it's gcc fault. Compiler should NEVER do that with volatile. Read&write of those should ALWAYS stay in place. Someone must had a reason to set variable volatile. Preceding and following code blocks could be reordered as much as want, but one should expect at least that preceeding code WAS executed and following WAS NOT." What you are saying is that the compiler should penalize legitimate users of 'volatile' (for the things it's documented to be safe for) so that you can abuse it. Developers of threaded code have two choices: 1) They can extend 'volatile' so that it does what they want. People who use 'volatile' for the purposes suggested in the C standard will suffer a performance penalty. And since 'volatile' is only on or off, it will have to do everything (force ordering, force atomicity, force all types of visibility), so all code that uses it will be very slow. 2) That can let 'volatile' serve its intended purposes and add their own synchronization mechanisms that are finely-tuned to their specific requirements. For reasons that should be obvious, '2' was selected. So 'volatile' does not force memory ordering because if it did, code that didn't need that would suffer a penalty for no reason. Instead, there are ways to force memory ordering where you need it, such as memory barriers. |
| August 27, 2008 11:57 AM PDT
Irwin | It is good to note that the volatile semantics of Java are such that volatile acts as a memory barrier and prevents reordering. So if you program in Java, don't scrape volatile from your language yet... it's still a very handy keyword ! |
| August 27, 2008 3:42 PM PDT
Arch Robison (Intel) | Right. In C# volatile also has the fence semantics. It's another example where Java/C# use tokens similar to C++, but have very different semantics. |
| September 2, 2008 10:35 AM PDT
Codeplug |
> I think this article by Andrei Alexandrescu (a few years old) outlines a usage > of volatile which seems to be quite useful for multithreaded programming: > http://www.ddj.com/cpp/184403766 I disagree. More discussion here: http://groups.google.com/group/comp.programming.threads/brow.....f0b18bd62d gg |
| December 16, 2008 7:17 AM PST
megumi | 干洗机 上海保洁公司 干洗机 SEO 网站优化 服装搭配 上海保洁公司 干洗机 干洗机 卫星电视 |
| April 23, 2009 8:38 PM PDT
David Schwartz |
There could be three possible reasons to use 'volatile': 1) If it was necessary, you would have to use it. But it's not. Things like mutexes do the job. 2) If it was sufficient, you could use it. But it's not. It doesn't provide atomicity or visible ordering from another thread. These are almost always exactly what you need. That leaves only: 3) If you could combine it with something else so that the net result was sufficient and neither of those things alone are sufficient. There are a very, very limited number of examples where this is the case, but there are none of them in the context of POSIX threads. In all of these cases, 'volatile' is not used for its defined C/C++ semantics but for special semantics it has on that particular platform or as a generic qualifier just to ensure the correct version of an overloaded function is selected. |
| April 26, 2009 2:18 PM PDT
Bug Slayer |
To claim that volatile is useless is naive at best. This claim is perhaps as silly as believing that volatile will fix all multithreading issues, which of course, it won't. Consider the following pseudo-code: THREAD 1: struct MyObject{volatile A a; volatile B b;}; volatile MyObject o; volatile bool bDone = false; ...Queue a request to thread 2, asking thread 2 to do something with o... while (!bDone) {} // spin for a moment while thread 2 does its thing o.a->DoSomething(); // thread 2 is done, use o.a o.b->DoSomething(); // use o.b THREAD 2: o.a = new A(_thread_local_memory_of_some_sort); o.b = new B(_non_threadsafe_variable); _WRITE_BARRIER_ bDone = true; As written, there is nothing wrong with the above code. (If you "can't like it" see the disclaimer at the end.) The code REQUIRES volatile though. Consider what will happen if volatile is removed: THREAD 1 without volatile: struct MyObject{A a; B b;}; MyObject o; bool bDone = false; ...Queue a request to thread 2, asking thread 2 to do something with o... while (!bDone) {} // !!!!! This may (usually will) spin forever because the compiler may a) reduce this to "while (!false)" or or b) cache bDone in a register (which won't get modified by THREAD 2.) o.a->DoSomething(); // !!!!! This may (usually will) blow up because o.a may be cached, uninitialized, in a register o.b->DoSomething(); // !!!!! This may (usually will) blow up because o.b may be cached, uninitialized, in a register Volatile useless? I think not. Variables that can be accessed across threads need to give the compiler a "hint" that they are different. The compiler assumes that variables will be used in a single thread, and optimizes accordingly. Short of turning of all optimization, the only way to address this is to tell the compiler that a variable may change unpredictably...that it is "volatile." No matter how thread-safe your code is otherwise, it won't make up for failure to use volatile where it counts. True, volatile is not a replacement for critical sections; but, neither are critical sections a substitute for using volatile when needed. They address totally different needs. DISCLAIMER: Obviously there are some things about this example that are not ideal. Some people don't like spinlocks, (though this would be a very good use for one) some people don't like queuing things to other threads (A.K.A. delegates, messages, or signals/slots), some people don't like thread local variables, and o.b should be wrapped with critical sections instead. Please ignore your personal taste, I ignored it so I could make the example 10 lines long rather than 1000. |
| April 27, 2009 4:36 PM PDT
David Schwartz |
Bug Slayer: Your argument is nonsense. You might as well say this, "Assume a platform on which 'volatile' is necessary. On this platform, 'volatile' is necessary. Therefore the argument that volatile is not necessary is nonsense." You have a "_WRITE_BARRIER_" in your code. You don't specify the semantics of this barrier. Is it something that interacts in some special way with 'volatile' or not? If it isn't, then it can blow up even with volatile. Though it can't be cached in a register, it can be cached elsewhere, say in the CPUs pre-fetch buffer or in the other CPU's write posting buffer. ""Variables that can be accessed across threads need to give the compiler a "hint" that they are different. The compiler assumes that variables will be used in a single thread, and optimizes accordingly. Short of turning of all optimization, the only way to address this is to tell the compiler that a variable may change unpredictably...that it is "volatile." No matter how thread-safe your code is otherwise, it won't make up for failure to use volatile where it counts. True, volatile is not a replacement for critical sections; but, neither are critical sections a substitute for using volatile when needed. They address totally different needs."" This is totally and utterly false. YOU DO NOT NEED TO USE VOLATILE IF YOU USE CRITICAL SECTIONS. Period, end of story. So, no, the compiler does not need a hint. No, the compiler does not assume variables will be used in a single thread. No, volatile is not the only way to tell the compiler that a variable is accessed by other threads. Answer this simple yes or no question: "If my code accesses all shared variables under the protection of a single mutex, do I still need to declare any shared variables volatile?" If you answer "yes", you're simply factually wrong. If you answer "no", then it refutes about 2/3 of your argument that you "need to give the compiler" a hint. |
| June 30, 2009 5:00 PM PDT
Ian Lewis |
Regarding Alexandrescu's article: the discussion that Codeplug refers to seems to have missed the part of Alexandrescu's article that says "never use volatile with built-in types." I'm kind of on the fence about this one--I can see why it might be good to banish volatile to the same dust heap of history where goto currently resides (along with C-style casts, std::auto_ptr, and maybe const, too, depending on who you listen to). On the other hand, (mis)using volatile in the way Alexandrescu suggests is currently making my life easier by helping me identify shared objects in our codebase. We're introducing TBB-style task parallelism into a huge legacy game engine. By marking top-level shared objects "volatile" and following Alexandrescu's rules of "volatile correctness," I'm able to catch a good number of potential issues at compile time rather than seeing them at runtime (or finding them via grep, which was my other option). It's a hack, but I haven't been able to come up with anything that is less hacky and/or more useful. Perhaps we could just treat the "volatile==atomic" myth the same way we treat the "const lets the compiler optimize more" myth or the. Neither keyword does anything magic, nor is either one capable of hinting the compiler in any particularly useful way. But they do serve as useful extensions to the type system that can help a reasonably disciplined, maintainable codebase stay disciplined and maintainable. |
| July 7, 2009 6:56 AM PDT
David Schwartz |
The problem with doing that, Ian, is that it kills performance. In any event, unless your codebase is very unusual, it makes no sense to "identify shared objects". *All* objects in memory are shared. Depending on what you really mean, there's likely a way to do it that doesn't compromise performance. For example, code that basically says, "we must hold lock X here because we manipulate object Y -- in a debug build, fault if we do not". Or code that says, "we should have released lock X by the time we got here -- in a debug build, fault if we did not, in a release build, release the lock". It would be nice, however, if C++ supported a custom qualifiers that you could apply to variables just to mark them, to fail at compile time if they're not matched, and to select the desired overload. If I get some time, I'll see if I can throw a proposal together. |
| July 8, 2009 9:58 PM PDT
Chris |
Let's assume this code is executed on an x86 Intel processor. volatile int Ready; int Message[100]; void foo( int i ) { Message[i/10] = 42; Ready = 1; } As stated earlier, ready = 1 can be moved message[i/10] = 42. However, suppose that the code was rewritten such that a memory fence is inserted between the two assignment statements. void foo(int i) { Message[i/10] = 42; __asm{ mfence } ready = 1; } Let's further assume that the compiler will not reorder instructions around the inline assembly. If ready can be cached in a register, then the mfence has no impact. Thus, volatile is necessary, but not sufficient for writing correct, lock free, multi-threaded code. Can anyone either validate or disprove my understanding of the value of volatile for lock free, multithreading? |
| July 15, 2009 5:59 PM PDT
Ian Lewis |
David, I have to disagree. First, although I thought that it was obvious in context, let me explain what I "really meant" by shared objects. I mean mutable objects that may be accessed concurrently by multiple threads. Our codebase is only unusual in the sense that it's multithreaded. I've long considered one of the core skills of multithreaded programming to be the ability to separate shared objects from thread-local objects. With this definition in mind, it should be obvious that saying "all objects in memory are shared" seems as facile as saying "all objects in memory are writable." Of course this is to one extent or another true, but knowing that most of our memory is writeable doesn't stop us from wanting our code to be const correct. The point is not whether the hardware is capable of writing or sharing data, the point is what the programmer's intent is. (If there's still confusion as to what *my* intent was in my last post, I used the word "shared" to mean "shared between multiple threads." Sorry for not being more clear.) When I mark an object "const" I know full well that I'm not waving a magic wand and making my object impossible to alter. But I am marking it in such a way that the compiler will catch unintentional misuse. If you really believe that this technique kills perf, you might want to read Alexandrescu's article more closely. Your perf will suffer if you do something like make your data members public, or mark all of your access functions volatile (and forget to cast away the volatile-ness before accessing your data members). If you write properly encapsulated classes and use Alexandrescu's LockingPtr<> template, the volatile gets cast away before it has a chance to infect your codegen. The compiler is free to optimize as usual. I can't speak for all compilers and platforms, but MSVC 2008 appears to have no trouble optimizing code in the scope of a LockingPtr<>. One easy mistake to make is to over-volatilize things. For instance, you don't want to mark every data member volatile (in fact, you never want to mark built in types as volatile, for reasons that Alexandrescu addresses in his article). But if you're used to writing concurrent code, this should just be common sense. If you want to insert ten items into a list, you don't want to put a critical section into your List::Insert function. If at all possible, you write code outside the list to acquire the lock, add all ten items, then release it. Likewise, you want to mark things volatile at the highest possible level. The presence of the volatile keyword means "accessing this data requires a lock." If you place the volatile qualifier on a top-level data structure, then once you lock the structure you can cast away the volatile and have safe, optimized access to all of its members and their members and so on. The "in debug, fault if not locked" paradigm is in our codebase too. It complements but does not replace the "volatile correct" paradigm. One works at runtime, the other at compile time. Neither one is guaranteed to find all threading bugs, but both are helpful. Your last point is an excellent one. Given the choice I'd much rather be able to define my own qualifiers. In the absence of that choice, I'm abusing volatile. I'm not going to argue that it's a best practice for everyone. But when your job is to turn hundreds of thousands of lines of single-threaded code into a parallel application, you'll take whatever helps. :-) Ian P.S. I'd be happy to continue this conversation if you have more questions or can prove me wrong about this. :-) Email me at ian dot lewis at intel dot com. |
| July 24, 2009 6:57 PM PDT
moo |
I'm not sure if it's been mentioned before, but volatile has some interesting uses in multi-threaded programming. Its just that ensuring atomic or correctly-ordered accesses to primitive-typed variables is not one of them. This article from 2001 describes how to use the volatile qualifier to automatically catch situations with race conditions: http://www.ddj.com/cpp/184403766 However, the volatile keyword does uncover lots of codegen and optimization bugs in compilers: http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf https://www.securecoding.cert.org/confluence/display/seccode/DCL17-C.+Beware+of+miscompiled+volatile-qualified+variables |
| August 28, 2009 1:04 AM PDT
heh | A little late but I have to chime in. This is a pretty silly post. Volatile is vitally necessary for implementing the very things you say are needed for multithreading and this is sort of like saying you don't need bricks you only need walls. Made of bricks. Many people use volatile in silly manners but what C++ feature is that not true for? |

John "Z-Bo" Zabroski
An example of an alternative to using volatile is to use a surrogate object that manages the resource. For example, in Java, Bill Pugh's Initialization on Demand Holder Idiom uses a surrogate object to wrap a place holder for a resource that hasn't been acquired yet. It's a little strange to think of it as a surrogate object, but it helps to realize that once the resource it's guarding is used, all threads effectively see the same shadow, even the the blueprint for each thread might use a different variable name to refer to the shadow. All threads can manipulate the shadow.
Fences and Atoms are, in my eyes, kinds of surrogate objects. I'm not sure if that is a useful metaphor, though. I'm just 23 and don't have enough experience writing multi-threaded programs with complicated resource contention issues.
Also, I think your opposition to the use of volatile might be found here: http://www.ddj.com/cpp/184403766 and he discusses it more here: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
Andrei also has two C++ standards survey papers related to this: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2004/n1680.pdf
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1777.pdf