#pragma omp atomic vs InterlockedDecrement

#pragma omp atomic vs InterlockedDecrement

vasci_'s picture
#pragma omp atomic

My benchmarks have shown that

InterlockedDecrement is much faster than using #pragma omp atomic
Why? I would think the compiler can generate inline code here?
Composer XE 2011, Windows 64.
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
om-sachan (Intel)'s picture

Could you please provide sample code that we can compile to review the issue?


vasci_'s picture

This is my "sample" code. I am using Composer XE 2011 Update 3 64-bit compiler.

This is pretty simple.

If you do an assembly language listing, the omp code calls

__kmpc_global_thread_num and

While the "Windows" code seems to doing it "inline" assembly

LONG refs_=0;

void WINatomicAdd()


void OMPatomicAdd()
#pragma omp atomic

om-sachan (Intel)'s picture

It looks openmp atomic slower. Youmay use InterlockedIncrement.

Login to leave a comment.