Macro '__TBB_compiler_fence' is 'nop-ed' if TBB is compiled with Intel C/C++ compiler

Macro '__TBB_compiler_fence' is 'nop-ed' if TBB is compiled with Intel C/C++ compiler

Macro'__TBB_compiler_fence' is 'nop-ed' if TBB is compiled withIntel C/C++ compiler.

Header file: windows_ia32.h

...
#if __INTEL_COMPILER
#define __TBB_compiler_fence() __asm { __asm nop }
#elif _MSC_VER >= 1300
extern "C" void _ReadWriteBarrier();
#pragma intrinsic( _ReadWriteBarrier )
#define __TBB_compiler_fence() _ReadWriteBarrier()
#else
...

Could somebody explainwhyisthe macro'nop-ed'?

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Best Reply

Just have a look what actually usage of ReadWriteBarrier will produce at the code/asm level,
like NOP.

Quote:

It is important to understand that _ReadWriteBarrier
does not insert any additional instructions, and it does not prevent the CPU
from rearranging reads and writes-it only prevents the compiler from
rearranging them.

http://msdn.microsoft.com/en-us/library/ee418650%28v=vs.85%29.aspx

Quoting Maxym Dmytrychenko (Intel)...
It is important to understand that _ReadWriteBarrier does not insert any additional instructions
...

Interesting because my local version of MSDN installed with the Visual Studio 2005 doesn't have that
statement. I'll take a look at what codes will be compiled by Visual Studios 2005, 2008 and 2010 for a
simple Test-Case.

Quoting Maxym Dmytrychenko (Intel)Just have a look what actually usage of ReadWriteBarrier will produce at the code/asm level,
like NOP.

[SergeyK] Yes, I did look.

Quote:

It is important to understand that _ReadWriteBarrier does not insert any additional instructions

[SergeyK] Yes, I confirm this. It rather "ignores" some instructions that could create an Access
Violation.

, and it does not prevent the CPU from rearranging reads and writes-it only prevents the compiler from rearranging them.

http://msdn.microsoft.com/en-us/library/ee418650%28v=vs.85%29.aspx

It looks like avery tricky and unreliable feature in terms of portability between different
platforms and C/C++ compilers. MS C/C++ compiler ( Visual Studio 2005 )has NOT generated a 'nop'
assembler instruction. Also, that feature is NOT available if ALL optimizations are disabled.

I wouldn't rely on '_ReadWriteBarrier' intrinsic function in case of a highly portable C/C++ library
because it doesn't force a software developer to fix a possible problem in a code that could create
an Access Violation, like:

...
RTint *pData = RTnull;
g_iVariable = *pData;
g_iVariable = 7;
...

I understand that software developers on the TBB project could have some different considerations
regarding the '_ReadWriteBarrier' intrinsic function.

Two Test-Cases are provided and take a look if you're interested:

Note: I tested with Visual Studio 2005

...
#include

#pragma intrinsic( _ReadWriteBarrier )

RTint g_iVariable = 0; // Must be Declared as global!
...

Test-Case 1 - _USE_READWRITEBARRIER is NOT defined

...
// #define _USE_READWRITEBARRIER

// Test-Case 1
{
RTint *pData = RTnull; // Instruction is NOT generated
g_iVariable = *pData; // Instruction is NOT generated
#if defined ( _USE_READWRITEBARRIER )
_ReadWriteBarrier();
#endif
g_iVariable = 7;
0040189F mov dword ptr [g_iVariable (5F29C4h)], 7 // Instruction is generated
}
...

Test-Case 2 - _USE_READWRITEBARRIER is defined

...
#define _USE_READWRITEBARRIER

// Test-Case 2
{
RTint *pData = RTnull;
0040189C xor eax, eax
g_iVariable = *pData;
0040189E mov eax, dword ptr [eax]// Unhandled exception error ( see below )
004018A0 add esp, 0Ch
004018A3 mov dword ptr [g_iVariable (5F29C4h)], eax
#if defined ( _USE_READWRITEBARRIER )
_ReadWriteBarrier();
#endif
g_iVariable = 7;
004018A8 mov dword ptr [g_iVariable (5F29C4h)], 7
}
...

Unhandled exception error:

Unhandled exception at 0x0040189e in ScaLibTestAppD.exe: 0xC0000005: Access violation
reading location 0x00000000.

Some of my C/C++ compiler command line options are as follows:

/O2 /GF /Gm /EHsc /MTd /openmp /W4 /nologo /c /Zi /TP /errorReport:prompt

Optimization for Speed

This whole thread illustrates why internal features should not be accessed directly: other than not being supported across versions, they are also undocumented, requiring relevant knowledge and understanding (there is no issue here).

The example further illustrates why nonoptimised as well as optimised builds should always be tested.

Thank you guys for your feedback!

Leave a Comment

Please sign in to add a comment. Not a member? Join today