OMP_PROC_BIND is now supported on compatible non-Intel processors

The newest versions of the Intel® C++ and Fortran compilers now support OpenMP* environment variable OMP_PROC_BIND on compatible non-Intel processors for Linux* and Windows* platforms.  The compilers containing the fixes are Intel® Composer XE 2011 Update 13 and Intel® Composer XE 2013 Update 1.  Previous versions of these compilers do not support OMP_PROC_BIND, as defined by the OpenMP* Version 3.1 API specification, on non-Intel processors.  Setting OMP_PROC_BIND={true, false} on a non-Intel processor and running a program linked against the Intel® OpenMP* runtime would produce warnings about affinity not being supported. This has now been corrected, and setting OMP_PROC_BIND=true will bind OpenMP* threads to processors.  Setting OMP_PROC_BIND=false will allow OpenMP* threads to migrate between processors.

On Linux* systems only, GOMP_CPU_AFFINITY may be used to define a specfic set of OS processor IDs to bind OpenMP* threads to.  Note that GOMP_CPU_AFFINITY takes precedence over OMP_PROC_BIND.  If both are set in the execution environment and an Intel-compiled OpenMP* program is run, the following warning will be seen:

OMP: Warning #181: OMP_PROC_BIND: ignored because GOMP_CPU_AFFINITY has been defined

For more complete information about compiler optimizations, see our Optimization Notice.

12 comments

Top
(name withheld)'s picture

Hi Patrick,

Thanks for this clarification.

Since this machine is dedicated for a specific process, I should be able to guess how many cores are available by using the SYSTEM command "TaskList /FI "imagename eq xxx".

Then the task is to set the correct affinity for the added job.

So it would be very helpful if you can show me an example how "kmp_get_affinity(mask)" and " kmp_create_affinity_mask(mask)" can be called and used in a FORTRAN code.

That way I would have not have to worry about compiling the code for a specific affinity or setting an external environmental affinity mask.

Thanks,

pbkenned1's picture

Hello,
No problem, the low-level affinity API is not trivial subject matter. We can help you to make certain a given process' threads bind to specific cores, but in general the compiler cannot help with cross-process issues. In other words, the compiler doesn't provide a means for process A to dynamically discover what cores process B is running on. There might be some convoluted way of doing it by calling SYSTEM, but that is outside my area of expertise.

On the other hand, if you want to hard-code thread bindings in the executable, you might be able to use -Qpar-affinity=. Or, you could hard-code the bindings in each executable with the low-level affinity API, but you'd have to have some conditional logic in the source to build each executable with unique bindings. Incidently, if you are using the low-level API, it will override how you set Qpar-affinity, so unneeded in that case.

I'll see if I can outline using the low-level AI if you want to go that route. An example will be easier to follow than all this discussion.

Regards,
Patrick

(name withheld)'s picture

Hi Patrick,

Thanks for this suggestion, but I still need your help to get going.

I have parallelized code that can either be specified to run with 1 core (hyperthreading was disabled) or N cores.

So I have two problems one to make sure that a thread stays on the same core, and then to ensure that the threads of two simultaneously run processes don't vie for the same core.

Doesn't $OMP MASTER effectively turn the parallel code into single thread?

It seems that using the "kmp" level API can give me information about available cores.

e.g. it seems that kmp_create_affinity_mask can "Allocates a new OpenMP thread affinity mask, and initializes *mask to the empty set of OS procs".

If this command can identify available cores (presumably those that have not been bound to any other process).

Would you have an example of how these commands can be called from FORTRAN?

I tried to use "i_set = kmp_unset_affinity_mask_proc(1, 00000010)" to remove proc "1" from core "1" and the program bombed.

How do I compile the code to recognize the "kmp" ?

I am running ifort-11.1 and using the option -Qpar-affinity=verbose,compact,1,1, do I need this?

Sorry for so many questions; but it's a learning experience for me.

Thanks,

pbkenned1's picture

Hello dfishman,
Is your single-threaded case running in an OpenMP parallel region but just using one thread, or is it a purely serial code? It sound like the former to me. If that is the case, then at the beginning of the parallel region, you can use !$OMP MASTER to get the full machine proc mask (ie, the mask which indicates every machine hardware thread). The master thread (or actually any thread) knows how many threads are running in the parallel region. The master thread can then iterate over the number of OpenMP threads, and bind each thread, from 0 to N-1, to a specific hardware thread context (I hesitate to say 'core', since a core can be hyperthreaded). For example, for thread K, you could use kmp_unset_affinity_mask_proc(proc, mask) to remove all OS proc IDs from thread K's mask, except for the hardware thread you want K to execute on. So if K=3, and there are 8 hardware threads, set the mask for thread K to 00001000 (assuming threads are enumerated from 0 to N-1). This guarantees that OpenMP thread K will execute on hardware thread K for the extent of the parallel region.

(name withheld)'s picture

Hi Patrick, Oct. 31, 2013

After some digging I found that I am able to get the processor affinity by setting in the CMD window
"set KMP_AFFINITY=verbose,granularity=fine,proclist=[mask],explicit" .

The mask I was using for the 16 core machine is just 0,1,..15 which assigns the 1st thread to the 1st core and so on.

However, I run often more than one job on this machine and the job can need anywhere between 1 or more cores.

Is it possible to generically assign one thread per core (and keep it there) irrespective of what else is running on the machine?

In this way I will not need to keep track of which jobs are running on which cores and run the risk of assigning two threads to one core.

Thanks,

(name withheld)'s picture

Hi Patrick,

Just another note for clarification.
I am seeing an enormous improvement when I set core affinity via the task Manager Windows GUI.

Specifically, if I apply OMP_SET_NUM_THREADS(4) and specify 4 core affinities, the multithreaded code runs very well, while the single thread code runs much slower than it would have had I set OMP_SET_NUM_THREADS(1).

Ideally, I would like the code to recognize that it's single threaded and not use 4 cores.

So, it's this functionality that I am hoping I can accomplish with setting affinity dynamically within my code. (Some code cannot be parallelized so I need to keep it single thread).

Is this what kmp_set_affinity does, or is there a better more efficient way of doing what I am looking for?

Thanks for your patience.

(name withheld)'s picture

Hi Patrick;
Thanks for your response. I am running in a Windows environment. For the openMP I compile with -Qopenmp and call the OMP_LIB my FORTRAN code. I noticed that I cannot compile with simply calling kmp_set_affinity(). Do you have an example or a reference on how I can call the KMP from within the FORTRAN code?

Thanks,

pbkenned1's picture

omp_get_proc_bind() is a new feature added in the OpenMP API Spec 4.0. ifort-11.1 doesn't support the 4.0 Spec. Only the 14.0 Intel compilers have support for 4.0. I suggest using the low-level kmp affinity API, eg, kmp_get_affinity(mask), kmp_set_affinity(mask), etc. to accomplish the same thing.

(name withheld)'s picture

Hi all,

I am also seeing a very large performance impact of thread migration when using a 6 and core CPU. The CPUs are 4960X and E5-2867W.

I am attempting to solve that problem with the use of OMP_PROC_BIND. However, I am having a problem with setting core affinity.

My code is in FORTRAN and I am trying to use the OMP_GET_PROC_BIND and use a call OMP_PROC_BIND(.TRUE.).

I am using the ifort compiler. Version 11.1 Build 20100203 Package ID: w_cprof_p_11.1.060

I compile the code with -Qopenmp but I get an error"unresolved external symbol _OMP_GET_PROC_BIND".

All other openmp directives work; e.g. NTHREADS = OMP_GET_NUM_THREADS() etc.

Thanks,

Abhishek 81's picture

@Patrick:Thanks a lot, i will definitely go through it.This is sort of motivation that inspires me within,the people in the site are amazing.so much help to increase our knowledge.

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.