In reading the memory ordering section of Intel's Combined Software Developer's manual located here:
Volume 3, Chapter 8, Section 184.108.40.206 (Page 2,115 in that PDF) states:
[Intra-Processor Forwarding is Allowed]: The memory-ordering model allows concurrent stores by two processors to be seen in different orders by those two processors; specifically, each processor may perceive its own store occurring before that of the other.
This has always made sense to me in the reference of separate memory locations (as their example shows). However, what if Processor 0 and Processor 1 both issued stores to the same location but with differing values. IE:
[logical processor 0]: mov [_x], 1
[logical processor 1]: mov [_x], 2
Literally, the above noted statement would allow for the possibility that [logical processor 0] sees 2 in _x, and [logical processor 1] sees 1 in _x. Obviously cache coherency is designed to not allow that to happen, and I'm sure at the low level this can be explained away in terms of MESI, but is there a section in the manual(s) that outlines this case and specifically states/ensures that both logical processors will come to a coherent value (after store forwarding, etc. happens)?
The manual is so detailed and helpful that I am sure I am missing something. References would be insanely appreciated as my OCD would certainly be calmed with an official statement that a LOCK prefix isn't needed to ensure total ordering or some other such odd thing in this case.
Thanks in advance to everyone.