Ronald Green's paper on thread affinity control ( http://software.intel.com/en-us/articles/openmp-thread-affinity-control ) mentions that MPSS uses some of coprocessor's cores to manage the offload processes. The paper recommends to set the number of threads to 4*(N-1) in offload applications in order to free up 4 logical cores for MPSS.
However, I remember that older documents suggested that some specific logical cores are allocated for MPSS. It was cores 0, 1, 238 and 239 for a 60-core coprocessor, or something of that sort. Is this still the case in MPSS Gold and later?
What I want to know is — for best performance, should I set MIC_KMP_AFFINITY=explicit,proclist={something specific}, or will it be enough to just set MIC_OMP_NUM_THREADS=236 and MIC_KMP_AFFINITY=balanced/compact/scatter (depending on the application)?



