#pragma omp atomic vs InterlockedDecrement

#pragma omp atomic vs InterlockedDecrement

vasci_'s picture
#ifdef _USEWIN32LOCKAPI
	InterlockedDecrement(&refs_);
#else
#pragma omp atomic
	refs_--;
#endif

My benchmarks have shown that

InterlockedDecrement is much faster than using #pragma omp atomic
Why? I would think the compiler can generate inline code here?
Composer XE 2011, Windows 64.
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
om-sachan (Intel)'s picture

Could you please provide sample code that we can compile to review the issue?

Om

vasci_'s picture

This is my "sample" code. I am using Composer XE 2011 Update 3 64-bit compiler.

This is pretty simple.

If you do an assembly language listing, the omp code calls

__kmpc_global_thread_num and
__kmpc_atomic_fixed4_add

While the "Windows" code seems to doing it "inline" assembly

#include 
LONG refs_=0;

void WINatomicAdd()
{

	InterlockedIncrement(&refs_);
}

void OMPatomicAdd()
{
#pragma omp atomic
  ++refs_;
}


om-sachan (Intel)'s picture

It looks openmp atomic slower. Youmay use InterlockedIncrement.

Login to leave a comment.