#pragma omp atomic vs InterlockedDecrement

#pragma omp atomic vs InterlockedDecrement

#pragma omp atomic

My benchmarks have shown that

InterlockedDecrement is much faster than using #pragma omp atomic
Why? I would think the compiler can generate inline code here?
Composer XE 2011, Windows 64.
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Could you please provide sample code that we can compile to review the issue?


This is my "sample" code. I am using Composer XE 2011 Update 3 64-bit compiler.

This is pretty simple.

If you do an assembly language listing, the omp code calls

__kmpc_global_thread_num and

While the "Windows" code seems to doing it "inline" assembly

LONG refs_=0;

void WINatomicAdd()


void OMPatomicAdd()
#pragma omp atomic

It looks openmp atomic slower. Youmay use InterlockedIncrement.

Login to leave a comment.