I am attempting to understand the Intel memory model to allow me to write some multithreaded code. My aim is to copy some data into a buffer and set an index to allow another thread to access the data. The data is being copied using a rep movs command. I have tried to use a xchg command to store the index value.
mov edi,edx mov ecx, 0x12 rep movs DWORD PTR es:[edi], DWORD PTR ds:[esi] xchg DWORD PTR [ebx+0xdc],eax
Alternatively, I have tried using a mov followed by a lock command.
mov edi,edx mov ecx, 0x12 rep movs DWORD PTR es:[edi], DWORD PTR ds:[esi] mov DWORD PTR [ebx+0xdc],eax lock or DWORD PTR [esp],0x0
From my testing the xchg version does not seem to work as I hoped, but the mov and lock version does. From reading the Intel® 64 and IA-32 Architectures Software Developer’s Manual it would seem they should be equivalent. Is there a subtle difference between the xchg and a mov and lock methods?
From examples 8-13 and 8-14 the string movs commands are not reordered with the other store commands and 184.108.40.206 and 220.127.116.11 state that the lock and xchg instructions cannot be reordered with the other stores. Therefore, I assume the processor cannot reorder the commands in the code examples above.
However, what is not clear is what another processor may see.
From the section “8.1 Lock operations” “Because frequently used memory locations are often cached in a processor’s L1 or L2 caches, atomic operations can often be carried out inside a processor’s caches without asserting the bus lock. Here the processor’s cache coherency protocols ensure that other processors that are caching the same memory locations are managed properly while atomic operations are performed on cached memory locations.”
Also from “18.104.22.168 Software Controlled Bus Locking” “locked operations serialize all outstanding load and store operations (that is, wait for them to complete)”
From “8.2.5 Strengthening or Weakening the Memory-Ordering Model” “Locking operations typically operate like I/O operations in that they wait for all previous instructions to complete and for all buffered writes to drain to memory”
From “8.3 SERIALIZING INSTRUCTIONS” “The processor does not write back the contents of modified data in its data cache to external memory when it serializes instruction execution.”
Sections 8.2.5 and 8.3 seem to contradict each other?
Do the above statements mean that the lock operations (either lock or xchg) ensure all previous commands are complete, but they do not ensure their effects will be visible to another processor? If this is the case how can I ensure the data in my buffer will be visible to another processor when the index is changed.