_mm_sfence and memory barriers

_mm_sfence and memory barriers

David W.的头像

In another thread in this forum (http://software.intel.com/en-us/forums/topic/305582), there was a comment:

The _mm_?fence thererfor serves to purposes: 1) inform the compiler of the requirement of pending reads or writes not to be moved before or after the specified fence statement. And 2) the compiler is to insert an appropriate processor fence instruction, or lacking that a function call to perform the equivilent fencing behavior.

My question is, is there an authoritative source for #1?  I have yet to find a credible reference that says that _mm_mfence generates a compiler ReadWriteBarrier.

25 帖子 / 0 new
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
Sergey Kostrov的头像

>>...My question is, is there an authoritative source for #1?..

Please take a look at:

1.
Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 2 (2A, 2B & 2C):
Instruction Set Reference, A-Z

2.
Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3 (3A, 3B & 3C):
System Programming Guide

3.
MSDN
...
void _mm_lfence(void)

Guarantees that every load instruction that precedes, in program order, the load fence instruction is globally visible before any load instruction that follows the fence in program order.

void _mm_mfence(void)

Guarantees that every memory access that precedes, in program order, the memory fence instruction is globally visible before any memory instruction that follows the fence in program order.
...

David W.的头像

Quote:

Sergey Kostrov wrote:

void _mm_mfence(void)

Guarantees that every memory access that precedes, in program order, the memory fence instruction is globally visible before any memory instruction that follows the fence in program order.

Hey Sergey, thanks for the response.

I have seen those passages.  However, they don't actually answer the question.  For example, let's say that *all* you wanted to document was the MFENCE opcode.  That text would work very well for that.  Now, what if (as I believe) _mm_mfence performed both MFENCE + _ReadWriteBarrier and you wanted to document that?  That same text would work for that as well.

I see nothing in that text that excludes either case.

Sergey Kostrov的头像

Hi David,

Wouldn't it better to go practical? In 2012 I've created a very small test case ( just with _ReadWriteBarrier function ) and I could post it. However, I'm not sure if it will help you. So, let me know if you need the test case and I'll find it.

David W.的头像

What I was hoping for was some authoritative source I could quote.

As for the sample, that's an interesting question.  After pondering questions about how this works, it seems like the compiler itself must "special case" the _mm_?fence instructions to generate both the ?FENCE opcode, and (as I believe) the implicit ReadWriteBarrier associated with it.  And if the compiler is doing it, then conceivably different compilers could handle this differently.  So your sample may or may not show what you think it does, depending on exactly what compiler I'm using.

So while I appreciate your offer, I don't think your sample is going to get me what I need.

Again, if there were some authorative source that said "calls to _mm_mfence implicitly do a ReadWriteBarrier," then developers would know what to expect, and compiler writers would know what to write.  Instead we have vagueness in an area that is already notoriously complex.

I'm still holding out hope that an Intel Compiler developer may chime in here and at least describe what their product does.

Sergey Kostrov的头像

>>...Again, if there were some authorative source that said "calls to _mm_mfence implicitly do a ReadWriteBarrier,"
>>then developers would know what to expect, and compiler writers would know what to write. Instead we have vagueness
>>in an area that is already notoriously complex.

I agree that sometimes documentation is too fuzzy and I think a forum related to Intel Manuals, User Guides, References, etc, would really help to improve quality of technical information. If you find some reference(s) on Intel web-site for _mm_?fence and _ReadWriteBarrier intrinsic functions take a look on that web-page for a Feedback web-link and post comments / suggestions. I know that all these posts are monitored.

jimdempseyatthecove的头像

_ReadWriteBarrier intrinsic function is a compiler statement, no different than #pragma, that assures that the compiler does not issue reads or writes that appear in source code on one side of the _ReadWriteBarrier intrinsic function from/to reads or writes that appear in source code on the other side of the _ReadWriteBarrier intrinsic function. No code is inserted (other than that that may be buffered by the compiler). Also note, compiler optimizations (IMHO) are free to remove reads or writes on either side of the _ReadWriteBarrier (provided it is permitted to do so).

I sympathize with you in that the various fences and barriers (software and hardware) should be unambiguously specified with code examples and accompanied with comments that clearly explains in terms that a layman could understand (as opposed to someone who fully understands something assumed by the writer of the example).

Jim Dempsey

www.quickthreadprogramming.com
David W.的头像

Hello Mr. Cove, I was hoping to hear from you.  Thanks for your response.

Yes, I do understand the purpose for _ReadWriteBarrier().  I don't believe a sample is appropriate on the link that Sergey sent.  Ideally what I'd like to see is for the docs to add one tiny, but hugely clarifying phrase to each of:

  • _mm_lfence: Performs an implicit _ReadBarrier()
  • _mm_sfence: Performs an implicit _WriteBarrier()
  • _mm_mfence: Peforms an implicit _ReadWriteBarrier()

Assuming that they do in fact do so. 

Consider for a moment what happens if they don't:

_mm_mfence();

_ReadWriteBarrier();

If _mm_mfence *doesn't* imply a barrier, then the compiler is free to move statements after it and before the ReadWriteBarrier.  This completely defeats the purpose of creating an MFENCE at all.

In theory you could wrap the _mm_mfence with barriers on both sides.  However compilers don't necessarily see the world like you or I would, so I would not feel 100% confident that this has the same effect.  And besides, the whole thing becomes unnecessary if (as seems likely) the Intel compiler does the sensible thing here. 

But they need to SAY they did it.  Or say they didn't and describe how people should cope.

Sergey Kostrov的头像

>>...Ideally what I'd like to see is for the docs to add one tiny, but hugely clarifying phrase to each of:
>>
>>- _mm_lfence: Performs an implicit _ReadBarrier()
>>- _mm_sfence: Performs an implicit _WriteBarrier()
>>- _mm_mfence: Peforms an implicit _ReadWriteBarrier()
>>
>>Assuming that they do in fact do so...

David, Here is the link:
.
http://www.intel.com/software/products/softwaredocs_feedback

.
and please leave your comments.

jimdempseyatthecove的头像

At issue here is not only compiler documentation rather it is also C++ standards compliance. IOW if vendor XYZ documents and implements implicit _????Barrier() for _mm_?fence() will vendor ABC implement the implicit barrier?

Due to this uncertanty, it appears that you will be required to use a macro defined in some header (you supply) that figures out just what to do dependent on compiler and processor/platform.

Jim Dempsey

www.quickthreadprogramming.com
David W.的头像

> you will be required to ... figure out just what to do dependent on compiler

It sounds reasonable when you say it fast.  However this thread illustrates the intrinsic problem with the proposal: If no one will tell me what their compiler does (via docs, forums, etc), how do you create a header?

At this point, it's become clear that Intel's chief architect for the compiler team is unlikely to happen by and describe to me how the Intel compiler handles these specific intrinsics.  I have posted a message to the form Sergey suggested.  I'm sure they'll resolve this soon.

Even though we haven't come to a resolution here, I'd like to thank both of you for your responses.

jimdempseyatthecove的头像

David, when you are unable to get an answer, construct a proper work around:

#define _MM_LFENCE _mm_lfence();_ReadBarrier
#define _MM_SFENCE _mm_sfence();_WriteBarrier
#define _MM_MFENCE _mm_mfence();_ReadWriteBarrier
...
_MM_LFENCE();

How many times have you had to define INT8 becuase of uncertainty of __int8, int8_t, int8, char?

Jim Dempsey

www.quickthreadprogramming.com
David W.的头像

Quoting myself from above:

If _mm_mfence *doesn't* imply a barrier, then the compiler is free to move statements after it and before the ReadWriteBarrier.  This completely defeats the purpose of creating an MFENCE at all.

And while I haven't run into the INT8 issue you describe, I did just bump into a case where __LONG32 isn't 32bits.

Sergey Kostrov的头像

>>...And while I haven't run into the INT8 issue you describe, I did just bump into a case where __LONG32 isn't 32bits.

Where have you seen this and please provide technical details on how it is defined?

Long type always must be 4 bytes ( 32-bits ) for signed and unsigned values.
...
typedef long _RTV64C RTlong;
typedef unsigned long RTulong;
...
Where _RTV64C is:
...
#define _RTV64C __w64
...
CrtPrintf( RTU("\tRTlong - %2d\n"), sizeof( RTlong ) ); // 4 4 4 4 4 4 4
...
and for all 4s from left to right:

- WIN32 MSC VS20xx - 4 bytes
- WIN32 CE MSC VS20xx - 4 bytes
- WIN32 MSC VS98 - 4 bytes
- WIN32 ICC - 4 bytes
- WIN32 MinGW - 4 bytes
- WIN32 BCC - 4 bytes
- WIN32 TCC - 4 bytes Note: 23-year-old legacy C/C++ compiler

PS: David, I've spent lots of time on making that stuff as portable and as compatible as possible in some software. So, if you saw __LONG32 isn't 32bits then this is very wrong.

David W.的头像

Long type always must be 4 bytes ( 32-bits ) for signed and unsigned values.

While this is true in the Windows world, other OSs running on the i386 (like linux) have chosen different paths.  For example check out the first answer on (http://stackoverflow.com/questions/384502/what-is-the-bit-size-of-long-o...).  It gives an excellent explanation about the differences between linux's LP64 and MS's LLP64.

Sergey Kostrov的头像

>>...While this is true in the Windows world, other OSs running on the i386 (like linux) have chosen different paths...

This is Not just for Windows OSs, Desktop or Embedded, and you're mixing different types, that is long and long int.

>>...differences between linux's LP64 and MS's LLP64...

Microsoft doesn't use LLP64 at all (!) and you're talking about different types for a 64-bit "world". Here is a quote from GCC docs:
...
`__LP64__'
`_LP64'
These macros are defined, with value 1, if (and only if) the
compilation is for a target where 'long int' and pointer both use
64-bits...
...

David W.的头像

you're mixing different types, that is long and long int.

This code:

   printf("%d\n", sizeof(long));

Will print 4 whether it is compiled with 32bit or 64bit versions of MSVC.  However, on 64bit linux, you get 8.  There are all kinds of articles that describe this fact (for example http://www.unix.org/version2/whatsnew/lp64_wp.html).

Microsoft doesn't use LLP64 at all

They say they do: http://msdn.microsoft.com/en-us/library/windows/desktop/aa384083%28v=vs....

There's also wikipedia (http://en.wikipedia.org/wiki/LLP64#64-bit_data_models).

Sergey Kostrov的头像

>>...Will print 4 whether it is compiled with 32bit or 64bit versions of MSVC...

This is absolutely correct for all Windows platforms and for 8 different C++ compilers I use.

jimdempseyatthecove的头像

We are drivting of the topic of this thread (_mm_sfence and memory barriers).

btw>>Long type always must be 4 bytes ( 32-bits ) for signed and unsigned values.

From Turbo C User's Guide V 2.0 (1988):

"The actual sizes of short, int, and long are dependent upon the implementation; all that C guarantees is that the a variable of type short will not be larger than (that is, will not take up more bytes) than one of type long. In Turbo C, these types occupy 16 bits (short), 16 bits (int), and 32 bits (long)."

Granted, ~1988 this was refering to a compiler targeted to 16-bit platform (re int being 16-bits). The point being in C, the only restriction relating to sizes was type short had to be fewer bytes than type long. No guarantee on sizes at all.

Jim Dempsey

www.quickthreadprogramming.com
Igor Levicki的头像

It is important to understand that x86 fencing instructions deal with architectural state preventing the CPU to execute reads and writes on both sides of the instruction out of specified program order -- they do not imply or guarantee in any way that read and write instructions won't be reordered by the compiler.

However, I do agree with you that intrinsic _should_ imply barrier for the compiler code reordering and I suggest submitting a feature request to Premier Support. My personal opinion is that Intel won't change it because they are blindly copying Microsoft in every compiler feature be it good or bad.

-- Regards, Igor Levicki If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.
David W.的头像

It is important to understand

Actually, I do understand this.  The MFENCE opcode only affects the processor, not the compiler.  However, the (unanswered) question is: What does _mm_mfence affect?  Conceivably it could just be a wrapper for MFENCE.  Or it could also imply a barrier.  Since this is done as a compiler intrinsic every place I've seen, only the compiler writers can say what they did.

Since I started this thread, I have been talking with the GCC people to see what they do.  It turns out that they do an implicit _ReadWriteBarrier with their _mm_mfence (although you have to look hard to find it).

I asked in the MS forum about what they do, but received no useful answers.  I have since opened a bug (https://connect.microsoft.com/VisualStudio/feedback/details/790233/mm-mf...) saying that _mm_mfence should have a barrier.  I said that the reason I don't believe it does is that the docs don't say it does.  If (as code inspection suggests) they really do the right thing here, hopefully this will turn into a doc bug.

I do agree with you that intrinsic _should_ imply barrier

I thank you for this.  Not everyone I've talked to seems to feel this way, but to me it's obvious.

I suggest submitting a feature request to Premier Support.

I am not currently on any paid support program with Intel.  Would they accept feature requests from me?  Link?  Since I'm hoping this is just a simple doc omission, I have sent a doc request to the link Sergey provided.  No response so far.

Intel won't change it because they are blindly copying Microsoft

Actually, this may be a good thing, since MS *appears* to be doing the right thing here.  But even if everyone is doing the wrong thing, writing down what you've done seems like the least you could do.

David W.的头像

Just to wrap this up...

The implementations of the _mm_?fence instructions are all compiler specific.  Whether they correctly perform barriers is dependent on the compiler implementation:

I post this here in case some future google search sends someone here.

FWIW

Sergey Kostrov的头像

>>...The implementations of the _mm_?fence instructions are all compiler specific...

Sorry, No. The same instructions are used.

However, Not every C/C++ compiler supports it. For example, in case of a legacy Turbo C++ v3.x ( 22 year old ) compiler I use the following piece of code:
...
//*** Irt?fence ***//
#define IrtSfence() { __emit__( 0x0F, 0xAE, 0xF8 ); }
#define IrtLfence() { __emit__( 0x0F, 0xAE, 0xE8 ); }
#define IrtMfence() { __emit__( 0x0F, 0xAE, 0xF0 ); }
...

David W.的头像

Hello Sergey.

Had I said "the implementations of SFENCE are all compiler specific", you would have been correct.  But _mm_sfence can (and sometimes does) perform implicit barriers IN ADDITION to emitting the SFENCE opcode.

Whether a compiler does that while processing the _mm_sfence is a decision that only the compiler can make.

For that reason, I stand by my statement that "The implementations of the _mm_?fence instructions are all compiler specific." 

 

Jennifer J. (Intel)的头像

DavidW and all,

The Intel Compiler treats the _mm_mfence, _mm_lfence, and _mm_sfence intrinsics as ReadWriteBarrier, ReadBarrier, and WriteBarrier, respectively. Hope this clears any confusions.

Thanks,

Jennifer

登陆并发表评论。