Cache Coherency

Cache Coherency

I am running Vtune on a Dual Xeon Processor System. I would like to measure the coherency misses between the two processors. Are there any parameters in VTune to do the same?.

Also is it possible to use the CPUID assembly instruction to uniquely find out the processor ID (in case of a DP system)?. Basically I would like my user level p-threads running on the DP Xeon processors to be able to identify which processor it is running in. Is it possible?

Thanks
Gautham

6 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hi Gautham you can measure read/writes to the same cache line by two different processors w/ the Memory Order Machine Clear event. You can always read from the same cache line from multiple processors. However if you read/write (ie read w/ one thread while writing w/ another thread to the same 128 bytes) to the same cache line w/ multiple processors you will pay a pretty high performance penalty. The Memory Order Clear event will fire every time this happens.

For OS threads you can set which processor you are running on with SetThreadAffinityMask(). However this is not possible on user mode threads since they have their own scheduler and each user mode thread may not map to a seperate os thread. You can determine this by looking at the thread view for your process in VTune and see if the number corresponds w/ the number of pthreads you are using. As far as programmatically determining the processor you are executing I am not sure if there is a way to do this. In kernel mode you can call KeGetCurrentProcessorNumber. I am not sure if there is a user mode equivalent.

When I first saw this post, I was thinking of the P-III Xeon, and waiting for someone with knowledge of the past to answer! Many people continue to post questions about MT for those older models.

We have verified experimentally that false sharing occurs between logical processors, as well as separate processors, when one thread reads and the other writes to the same 128 byte line, as Birju pointed out. When 2 threads write to the same cache line, the problem is restricted to a 64 byte line.

I'm not certain whether there may be possible BIOS variations which would affect these conclusions. I assume that Birju's tip about MOMC events should help when diagnosing any of these false sharing cases.

Thanks for your replies. One more question.I launch 4 threads on a MP Xeon system (2 logical processors per proc = 4 processors). So I guess even in this case the OS will keep switching my 4 threads between the 4 logical procs?. Is there any way I could pin them to a particular logical proc to ensure that a thread always runs on only one logical proc and the OS does not keep switching it?.

Thanks

Yes, assuming you're using one of the more recent versions of Windows. There are OS calls to set processor affinity.

You ought to be able to find some working examples on the developer.intel.com index to Hyper-Threading Technology.

For example, Khang Nguyen's CPU Counting Utility Code Sample may help.

Yes you can do this using SetThreadAffinityMask on Windows.

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen