monitor/mwait performance differs in different memory addresses

monitor/mwait performance differs in different memory addresses

Hi everybody,

We are working on a new research operating system. To do message passing, we use different mechanisms, including polling, IPIs, and monitor/mwait. To benchmark the performance, we send a ping-pong message between two processes running on two different cores, and count the number of cycles for this round-trip message on sender core. The thing that confuses us is that it seems monitor/mwait's performance differs few hundred cycles if we change the address of monitor area. I have to mention that we use WriteBack cache policy, and the processor is Intel(R) Xeon(R) CPU E31270 @ 3.40GHz which is not NUMA. We used two addresses which are relatively close to each other. first one was 0x100, and the second one 0xA0800.

Is monitor/mwait performance address dependent?

8 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Maybe because the address is at very low memory range?

Are these physical addresses or virtual addresses?

Do you "touch" memory in the same page as the monitored location prior to entering monitor in both cases? (i.e. preload page table in event not loaded)

Can you setup a test to use addresses 0x100 and 0xA0100? (i.e. same relative offset within a page).

Jim Dempsey

www.quickthreadprogramming.com

>>...Is monitor/mwait performance address dependent?

Take a look at a description for MONITOR instruction in a latest Instruction Set Reference on a page 560. There is a statement:

...
the address range that the monitoring hardware checks for store operations can be determined by using CPUID...
...

If one of the addresses you've mentioned is outside of allowed memory range ( let's say 0x100 ) than it is absolutely not clear what happens according to the manual.

That range is 64B on HW.. I recently tested that.  Cpuid provides the minimum and maximum range.. and it's what I said.

Perfwise

Presumably one logical processor is monitoring 0x100 and a different logical processor is monitoring 0xA0800, and each have a monotor "window" within one cache line. Hamid, would you comment on this?

Jim Dempsey

www.quickthreadprogramming.com

Is such low memory address 0x100( if it is physical address ) available for user mode software?

I suppose that those are really relative offsets within process memory page(s)

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi