Suppose I have a machine on which I can, say, 8 simultaneous threads. Now suppose I have 2 distinct "omp parallel" blocks. I want to use all 8 threads in both blocks.
However, one paralle block has BLAS calls. The other does not. If I have 8 threads going on these 2 blocks, when the BLAS function is reached will MKL try to parallelize the BLAS call? I'm afraid that this will slow things down because I have no more threads available in my machine. I would still like to take advantage of the MKL-BLAS speedup, but I do not want it to multi-thread because I am already multi-threading at a higher level.
How do I control this? I understand I have the environment variables OMP_NUM_THREADS and MKL_NUM_THREADS. However the BLAS are not called in every parallel block and it is my understanding that these environment variables are only read once.
Can somebody comment on how this threading can be controlled?