We are working on a new research operating system. To do message passing, we use different mechanisms, including polling, IPIs, and monitor/mwait. To benchmark the performance, we send a ping-pong message between two processes running on two different cores, and count the number of cycles for this round-trip message on sender core. The thing that confuses us is that it seems monitor/mwait's performance differs few hundred cycles if we change the address of monitor area. I have to mention that we use WriteBack cache policy, and the processor is Intel(R) Xeon(R) CPU E31270 @ 3.40GHz which is not NUMA. We used two addresses which are relatively close to each other. first one was 0x100, and the second one 0xA0800.
Is monitor/mwait performance address dependent?