On a host (Xeon Nehalem and later) you have NUMA architecture. When configured for NUMA, an application has an API to specify which node an allocation is to be performed on, alternately "first touch" can be used. Set aside "first touch" for my queries relating to Xeon Phi. I am interested in directed allocation.
From Xeon Phi Software Developer's guide, page 32 is figure 2-12. This diagram and accompanying text illustrates that (given permissions and initialization) each Xeon Phi (up to 8) can map its own memory (first 64GB addresses), as well as Coprocessor’s 1:7 (64GB addresses). Not all addresses populated. From the figure 2-12 it would appear that Coprocessor’s 1:7 are unable to map Coprocessor 0’s memory. I cannot imagine that this is an oversight, so what is the mapping to permit say coprocessor 1 to map coprocessor 0’s memory?
The Host and Phi(s) can map each other’s memory (with size limitations) using Virtual Shared Memory (somewhat analogous to NUMA, though not same as NUMA). The C/C++ API has _Offload_shared_malloc(size) etc… but there is no argument to indicate if the shared memory resides in the host attached memory or within the Xeon Phi’s attached memory, or within which of several Xeon Phi’s attached memory. For complete orthogonallity, one would need to be able to specify if the shared memory is in the Host as well as in which specific Xeon Phi (or other attached offload device). Do you have additional information regarding this? I would imagine there are undocumented functions to do this, can we have the API please?