Is anyone aware of a basic tool for verifying first-touch memory allocation on a NUMA platform such as Xeon EP?
According to usual expectation, pinning of MPI processes to a single CPU should result in this happening automatically (barring running out of memory, etc.), unless a non-NUMA BIOS option has been selected.
Likewise, OpenMP where data are initialized by a parallel data access scheme consistent with the way they will be used should result in allocation local to the CPU, rather than on remote memory.
For this to work, apparently, MPI or OpenMP libraries have to coordinate with the BIOS.
It seems there might be a way to determine the address ranges which are local to each CPU on a shared memory platform and perform tests to see where each thread is placing its first touch allocation.
As you might guess, I'm looking for verification of suspected performance problems which seem to indicate threads within MPI ranks pinned to certain CPUs consistently using remote memory.