MYO "ours" address mapping

MYO "ours" address mapping

On a host (Xeon Nehalem and later) you have NUMA architecture. When configured for NUMA, an application has an API to specify which node an allocation is to be performed on, alternately "first touch" can be used. Set aside "first touch" for my queries relating to Xeon Phi. I am interested in directed allocation.

Question 1:

From Xeon Phi Software Developer's guide, page 32 is figure 2-12. This diagram and accompanying text illustrates that (given permissions and initialization) each Xeon Phi (up to 8) can map its own memory (first 64GB addresses), as well as Coprocessor’s 1:7 (64GB addresses). Not all addresses populated. From the figure 2-12 it would appear that Coprocessor’s 1:7 are unable to map Coprocessor 0’s memory. I cannot imagine that this is an oversight, so what is the mapping to permit say coprocessor 1 to map coprocessor 0’s memory?

Question 2:

The Host and Phi(s) can map each other’s memory (with size limitations) using Virtual Shared Memory (somewhat analogous to NUMA, though not same as NUMA). The C/C++ API has _Offload_shared_malloc(size) etc… but there is no argument to indicate if the shared memory resides in the host attached memory or within the Xeon Phi’s attached memory, or within which of several Xeon Phi’s attached memory. For complete orthogonallity, one would need to be able to specify if the shared memory is in the Host as well as in which specific Xeon Phi (or other attached offload device). Do you have additional information regarding this? I would imagine there are undocumented functions to do this, can we have the API please?

Jim Dempsey
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Also, related to Question 2:

The sketchy documentation on  _Offload_shared_malloc(size) etc… does not indicate if code running on MIC also has access to this function with or without the specificity of the placement of the shared memory.

Jim Dempsey

Regarding Question 2, _Offload_shared_malloc(..)

That API is usable only when using the Intel compiler's _Cilk_shared and _Cilk_offload feature. Memory allocated using that API is allocated in host memory and in the memory of each attached Xeon Phi device. The contents are synchronized between CPU and a Xeon Phi when an offload occurs to that Phi and it accesses the memory.

Thanks Ravjiv.

"Shared" memory isn't truly shared, rather it is synched memory. I would then need to keep in mind the "_Offload_shared_" prefix refers to shared copies made at "#pragma offload" as opposed to shared memory as viewed on a ccNUMA system. From reading the Xeon Phi datasheet, the host and coprocessor(s) could support NUMA (without fully supporting the cc of ccNUMA). The datasheet states the (respective) processors snoop the addresses written on the PCIe mapped memory. So cache line invalidation could be observed.

Jim Dempsey

Leave a Comment

Please sign in to add a comment. Not a member? Join today