Some of the Xeon Phi documentation mentions that physical addresses are distributed among the memory controllers using a hashing function, and that cache line tags are distributed among the per-core portions of the distributed tag directory via a (presumably different) hashing function.
I'd like to know the details of those hashing functions. I'm trying to measure what the latencies and bandwidths are for memory accesses as a function of the originating core and the targeted memory controller (and likewise for local L2 misses as a function of (a) originating core, (b) which core hosts the tag directory portion, and (c) which core holds the data in its L2).
I realise this is extremely low-level and perhaps beyond what Intel wants to document, but it's something we need to know for our performance tools.
Or are all communication costs perfectly symmetrical?