Problem with setting number of workers in MIC

Problem with setting number of workers in MIC

Hi Everyone,

I am trying to write a hybrid application which will use both CPU and MIC at the same time.  In my job script I added the following lines:


export MIC_KMP_AFFINITY=scatter

Then I tried to see how many workers were actually running on MIC. So I printed the __cilkrts_get_nworkers() on MIC. And surprisingly this was always giving me 1!

I tried to set the worker counts also. And that attempt was also failed. Does anyone know a solution to this problem. How to ensure that MIC will use "N" number of cores? If anyone is aware of the solution, please let me know. 

Thanks in advance.

__attribute__((target(mic))) void kernel()

if (0!= __cilkrts_set_param("nworkers","244"))
std::cout<<"Failed to set worker count\n";




4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Jesmin, 

You seem to be doing the right things to control the number of threads on the coprocessor. Here is a small code that I used to test your issue: 

#include <stdio.h>
#pragma offload_attribute(push,target(mic))
#include <cilk/cilk.h>
#include <cilk/cilk_api.h>
#pragma offload_attribute(pop)
int main(){
#pragma offload target(mic)
printf("#Threads because of environment variable:%d.n",__cilkrts_get_nworkers());
printf("#threads after call to __cilkrts_set_param(): %d.n",__cilkrts_get_nworkers());

Before running, I set the following environment variables 




and this is the output which I got: 

#Threads because of environment variable:60.
#threads after call to __cilkrts_set_param(): 240.

Could please check if your code is actually offloading to the coprocessor. Also, could you please confirm if your coprocessor is online by using /opt/intel/micinfo. Also, if you are using the cilk runtime function call to set the number of threads, please make sure that you call this function within an offload to set the number of threads on the coprcocessor. 


The  MIC_ENV_PREFIX stuff applies to offloaded regions only.  That isn't what most people mean by hybrid.

In my own tests, I haven't had success with CILK_NWORKERS equal to or exceeding the number of cores.

OpenMP (to which KMP_AFFINITY=scatter would apply, which is what you get with an OpenMP offload region), doesn't necessarily work with Cilk+ parallel.  You can usually alternate between them if you observe the KMP_BLOCKTIME latency for disbanding the OpenMP thread pool.

Thanks Sumedh and TimP.

I tried to set the number of workers from the CPU code and that works!. I think it is not possible to set workers from inside of an offloaded function.


Thanks again.



Leave a Comment

Please sign in to add a comment. Not a member? Join today