monitor/mwait performance differs in different memory addresses

monitor/mwait performance differs in different memory addresses

Hi everybody,

We are working on a new research operating system. To do message passing, we use different mechanisms, including polling, IPIs, and monitor/mwait. To benchmark the performance, we send a ping-pong message between two processes running on two different cores, and count the number of cycles for this round-trip message on sender core. The thing that confuses us is that it seems monitor/mwait's performance differs few hundred cycles if we change the address of monitor area. I have to mention that we use WriteBack cache policy, and the processor is Intel(R) Xeon(R) CPU E31270 @ 3.40GHz which is not NUMA. We used two addresses which are relatively close to each other. first one was 0x100, and the second one 0xA0800.

Is monitor/mwait performance address dependent?

8 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
iliyapolak的头像

Maybe because the address is at very low memory range?

jimdempseyatthecove的头像

Are these physical addresses or virtual addresses?

Do you "touch" memory in the same page as the monitored location prior to entering monitor in both cases? (i.e. preload page table in event not loaded)

Can you setup a test to use addresses 0x100 and 0xA0100? (i.e. same relative offset within a page).

Jim Dempsey

www.quickthreadprogramming.com

>>...Is monitor/mwait performance address dependent?

Take a look at a description for MONITOR instruction in a latest Instruction Set Reference on a page 560. There is a statement:

...
the address range that the monitoring hardware checks for store operations can be determined by using CPUID...
...

If one of the addresses you've mentioned is outside of allowed memory range ( let's say 0x100 ) than it is absolutely not clear what happens according to the manual.

That range is 64B on HW.. I recently tested that.  Cpuid provides the minimum and maximum range.. and it's what I said.

Perfwise

jimdempseyatthecove的头像

Presumably one logical processor is monitoring 0x100 and a different logical processor is monitoring 0xA0800, and each have a monotor "window" within one cache line. Hamid, would you comment on this?

Jim Dempsey

www.quickthreadprogramming.com
iliyapolak的头像

Is such low memory address 0x100( if it is physical address ) available for user mode software?

iliyapolak的头像

I suppose that those are really relative offsets within process memory page(s)

登陆并发表评论。