I have a NUMA system. There is a thread for each core in the system. Threads that process similar data are assigned to the same node to reuse the data in the large L3 cache of the node. I want threads that are assigned to the same node should steal each other's jobs. If all jobs on a node have finished, these threads should steal jobs assigned to threads on other nodes. How can I implement this via OpenMP?
For more complete information about compiler optimizations, see our Optimization Notice.