Volatile Worthless or Not
Over at this blog, Arch Robinson claimed that volatile is almost worthless for multithreaded programming. I, Chris,
argue that volatile is necessary in order for memory fences to work with lock free programming. Can anyone clarify
this?
http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
| |
Re: Volatile Worthless or Not
Over at this blog, Arch Robinson claimed that volatile is almost worthless
for multithreaded programming. I, Chris, argue that volatile is necessary in order for memory fences to work with lock
free programming. Can anyone clarify this?
http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
Arch is referring to ISO C++'s volatile which has nothing to do with multi-threading, memory fences and lock-free
programming. And you are probably referring to Microsoft Visual C++'s volatile which is basically promoted to the
rank of multi-threading synchronization primitive.
| |
Re: Volatile Worthless or Not
I think that this is about flushing out the writes somehow. so that the other thread gets to see the new Ready value,
which makes at least some sense to me. But I'm also in favour of using real atomics, and you can write those without a
single "volatile" (I did, anyway), because they would be redundant with the inline-assembler store instructions. Still,
what prevents the optimiser from reordering even those with later code, perhaps code that waits for a value that cannot
appear before the store is seen by another thread, leading to deadlock? I checked my code again, and I thought that I
had at least a compiler fence at both ends to prevent just that, but not so, apparently, and I now consider that an
oversight. I'm aware that language-level atomics can do other kinds of things, if only the specification were
readable...
(Added) "I did, anyway": or not yet... they're still there in tbb_machine.h.
| |
Re: Volatile Worthless or Not
atomic<T> (from my understanding) has operators operating on (naturaly aligned) volatile'd variables (look
at the primitives in TBB's atomic.h)
volatile though does not assure the variable is aligned on a natural
boundry (natural boundry variables can be atomicaly R, W, or LOCK RMW).
volatile does assure that the
compiler will always generate code to reference memory
atomic<T> typed variables are aligned on natural
boundry.
Therefore atomic typed variables, when used internaly with volatile, are assured to atomically R,
W, or LOCK RMW.
volatile alone typed variables are not assured to be aligned on natural boundry, therefore
are NOT assured to atomically R, W, or LOCK RMW...
...UNLESS the programmer takes caution to enforce
alignment rules on such declared variables.
atomic<T> typed variables are permitted to have some
optimizations performed on them
Some of these optimizations may interfere with multi-threaded programming.
atomic<int> line = 0; ... line = 1; line = 2;
compiler is permitted to remove
first statement. Therefore, if other thread is monitoring progress, it will never see line=1.
Be cautious in
assuming atomic<T> variables will perform as you intend. If you require some probability of reading all instances
of (programmed) writes then do not use atomic, use volatile.
volatile int line = 0; ... line =
1; line = 2;
compiler is NOT permitted to remove first statement. however, it is not required to
align line on natural boundry.
A stack local int variable will tend to be naturally aligned. A structure
member variable is not assured to be naturally aligned A a programmer, you are required to assure alignment (when
atomnicity is required).
Jim Dempsey
Blog: The Parallel Void
www.quickthreadprogramming.com | |
Re: Volatile Worthless or Not
"Therefore atomic typed variables, when used internaly with volatile, are assured to
atomically R, W, or LOCK RMW."
The RMW operations require specific assembler instructions, but the existing
TBB implementation does not bother with those for ordinary loads and stores on x86 (except for 8-byte data) or x64,
that's true. I don't agree with that,
though, even if it happens to work. I think that some existing code would break if volatile were taken out, so that
decision is not so easy to make.
| |
Re: Volatile Worthless or Not
>>The RMW operations require specific assembler instructions
Correct - these (RMW) are provided
with compiler intrinsics (or inline assembler).
the atomic<T> class will use these compiler intrinsics
(or inline assembler). So atomic will "hide" the nasties
However, use of atomic will require at times
if(var.read_the_variable_right_now_no_matter_what()==whatnot) ...
or whatever the end member
function ends up being called (or in lieu of member function some new keyword/directive as in C++0x)
where volatile can use
if(var==whatnot) ...
provided you are also careful to correctly
align the volatile variable.
atomic variables are good and thread safe
BUT the behavior may not
necessarily produce the desired result.
The use cautions are not clearly documented, at least to the point of
presenting a clear picture up front.
The same issue is involved with volatile with respect to
alignment/atomnicity
In both cases the usage examples should include "***WARNING***" when programming xxx use
yyy technique.
Jim Dempsey
Blog: The Parallel Void
www.quickthreadprogramming.com | |
Re: Volatile Worthless or Not
"or in lieu of member function some new keyword/directive as in C++0x" I didn't spot that yet?
"BUT the behavior may not necessarily produce the desired result." Specifically? The compiler's
coalescing optimisation you mentioned (let's discount statistics), or something else (too)?
"The same
issue is involved with volatile with respect to alignment/atomnicity" Such use seems dubious (not portable),
even ignoring memory semantics.
| |
Re: Volatile Worthless or Not
Still, what prevents the optimiser from reordering even those with later code, perhaps code that waits for a value
that cannot appear before the store is seen by another thread, leading to deadlock?
For volatile to work the way I intended, there has to be some way to insert a barrier that at least limits the compiler
from moving some types of instructions across it. Out of the three compilers Intel, Visual Studio, and gcc, do any of
them have such a feature?
| |
Re: Volatile Worthless or Not
"or in lieu of member function some new keyword/directive as in
C++0x" I didn't spot that yet?
"BUT the behavior may not necessarily produce the desired
result." Specifically? The compiler's coalescing optimisation you mentioned (let's discount statistics), or
something else (too)?
"The same issue is involved with volatile with respect to alignment/atomnicity" Such use seems dubious (not portable), even ignoring memory semantics.
Specifically the coalescing optimization. Which may result in coalescing across a lengthy loop. And in which case
can introduce unintended interlocks. (assuming you forgot to use the designated member function for atomic including the
correct std::memory_order_....)
The atomic appears to be written from the perspective of and a bias towards
the thread issuing the statements as opposed to the threads observing the results. Whereas volatile appears
impartial.
Coalescing optimizations are not always good
Assume you are on a processor with HT
capability Assume the hardware PREFETCHn instruction is either not implementen or not doing what you want it to
do. You can split the thread processing into one that does the work, and a second that monitors the progress of a
volatile variable(s) (non-coalescing). This second thread can be "brainless" so to speak and simply performs memory
moves to a register to the addresses pointed to by each of the volatile variable(s) should they change (and include
_mm_pause).
Essentialy the second HT thread does no work except for assuring soon to be accessed data is
fresh and ready in L1 cache. This will work across page boundries as well as through page faults.
In the
above scenario coalescing optimizations will thwart the intentions of the programmer.
The above is also a
simple example of synchronization through a memory mailbox coalescing optimizations interferes with such
synchronization. The two (or more) cannot keep in lock-step if some of the steps cannot be observed.
Jim
Dempsey
Blog: The Parallel Void
www.quickthreadprogramming.com | |
Re: Volatile Worthless or Not
"For volatile to work the way I intended, there has to be some way to insert a barrier that at least limits the
compiler from moving some types of instructions across it. Out of the three compilers Intel, Visual Studio, and gcc, do
any of them have such a feature?" Unless you come up against a fiercely optimising compiler that won't let go
even after linking, a portable compiler fence would require nothing more than a call to an external empty function. In
g++, you could do without the hassle of another source file, however trivial, and the overhead of making that call,
however small, by using a blank inline assembler call that "clobbers" all memory. At least, these are commonly
understood to provide the functionality you seek, at least between operations that work on data that is not just locally
visible, unless the g++ option even frees you from that restriction.
| | |