what is the effects of _mm_lfence and _mm_sfence ?

what is the effects of _mm_lfence and _mm_sfence ?

simbalee's picture

I have tried to use _mm_lfence and _mm_sfence to improve performance of my program. However, it turned to be 3~4 times slower.

In the document of Intel C++ Complier 9.0, the function _mm_lfence is described as:

Guarantees that every load instruction that precedes, in
program order, the load fence instruction is globally visible before any load
instruction which follows the fence in program order.

And for _mm_sfence, the description is:

Guarantees that every memory access that precedes, in program order, the memory
fence instruction is globally visible before any memory instruction which
follows the fence in program order.

I feel that it is hard to understand the meanings of these descriptions.
What does "globally visible" mean? How does it affect behavior of progam? Can anyone be so kind to make it clear? Thanks.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
jimdempseyatthecove's picture

In layman terms...


The C++ compiler produces object code, which upon linking becomes more or less binary code for the processor (it is actually in a loader format).


The processor will read the instructions more or less as it follows the program counter. Most of the newer processors upon reading a conditional branch may read instructions fromboth paths of the branch.


As instructions come in for processing they are examined as to what the "arguments" are. Some are in memory, some are in registers, some are part of the instruction. And some instructions do reads, some do writes and some do read/modify/writes.


Many years ago, processors execuited instructions in the order in which they were fetched from memory. Someone (many ones) between then and now made the observation that if the processor could re-order the reads and writes (as well as remember recently used data/instructions) that performance could improve significantly. Thus began out of order read/write as well as read combining and write combining.


For most programs (single threaded) the re-order does not affect the outcome. For multi-threaded programs which share data order does make a difference. Also, when reading or writing to devices order makes a difference.


In addition to the processor performing out-of-order operations the C++ compiler can do much the same. A good example of this is moving statements that produce a constant result outside of a loop.


There are situations where such optimizations will produce problems.


An example is if the seamingly static memory location is being referenced by another thread or device. This variable should be declared volatile so the compiler doesn't assume it can re-use a prior value read from the location (or defer writing to the location).


In addition fo volatility issues, ordering of memory access is often important. As well as atomic operations.


An example of this might by one thread of your program reads an index, incriments the index, writes the index, then used the index to write data. If the writing of the index is reordered to occure after the writing of the data then a different thread might make a false assumption about the state of the memory. (probably not a good enough of an example)


The _mm_?fence thererfor serves to purposes: 1) inform the compiler of the requirement of pending reads or writes not to be moved before or after the specified fence statement. And 2) the compiler is to insert an appropriate processor fence instruction, or lacking that a function call to perform the equivilent fencing behavior.


Globally visible means any thread/all thread/processoron an SMP system. This may include multiple processor cards (one or several in a system),each with potentiallymultiple processor packages, each package with potentially multiple procesor cores, each core with potentialy HyperThreading capability.


Someone from Intel might provide a link to a good reference on the subject.


Jim Dempsey


www.quickthreadprogramming.com
simbalee's picture

Thank you Jim. You explaination is very clear and helpful.

Login to leave a comment.