#pragma omp atomic vs InterlockedDecrement

#pragma omp atomic vs InterlockedDecrement

#ifdef _USEWIN32LOCKAPI
	InterlockedDecrement(&refs_);
#else
#pragma omp atomic
	refs_--;
#endif

My benchmarks have shown that

InterlockedDecrement is much faster than using #pragma omp atomic
Why? I would think the compiler can generate inline code here?
Composer XE 2011, Windows 64.
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Could you please provide sample code that we can compile to review the issue?

Om

This is my "sample" code. I am using Composer XE 2011 Update 3 64-bit compiler.

This is pretty simple.

If you do an assembly language listing, the omp code calls

__kmpc_global_thread_num and
__kmpc_atomic_fixed4_add

While the "Windows" code seems to doing it "inline" assembly

#include 
LONG refs_=0;

void WINatomicAdd()
{

	InterlockedIncrement(&refs_);
}

void OMPatomicAdd()
{
#pragma omp atomic
  ++refs_;
}


It looks openmp atomic slower. Youmay use InterlockedIncrement.

Leave a Comment

Please sign in to add a comment. Not a member? Join today