Is TBB's task stealing mechanism NUMA aware? That is, for example, assume that there are four sockets(NUMA nodes) in the system with a four core chip on each of them, and each socket has its own low latency local memory. When a task queue of a particular core runs out of tasks, will it first try to steal from the cores on the same socket and then from remote ones ? Are there any other NUMA related performance issues?
For more complete information about compiler optimizations, see our Optimization Notice.