The IA32 and Intel64 (host) processors have Guaranteed Atomic Operations for load and store of
word aligned word
double word aligned double word
P6 and later aligned and unaligned word, dword and qword within single cache line.
What are the guaranteed atomic load and store operations on Xeon Phi?
The reason I ask this is that I am observing non-atomic stores of dword and qword (__int32 and __int64) values (within cache line) where different threads are writing to different variables within the cache line. If I add inter-value pad to extend across cache line, the stores do not interfere. When pad removed stores interfere. I examined the disassembly to assure that GP register to memory instructions are used (IOW only mov of register to memory).
The code is store only, different threads writing to different locations. These are not Read/Modify/Write instructions.
Same code strategy works on host (Xeon E5).