In OpenMP can I control the placement of threads?

In OpenMP can I control the placement of threads?

Hi all,

I wrote a simpleprogram with Intel's OpenMP C++ compiler (version 8.0), which isexecuted on 4-processor SMP system running Linux. I have a couple of questions.

1. My program generates 4 threads which is the same number of the available processors. Then can I safely assume thatall threads will be placed on different processors? If not, is there any way I can pin down the threads on specific processors?

2. Once a thread starts execution, can I prevent it from yielding to another thread or process until completion ofthe parallel construct it belongs to? In other words, can I suppress preemption, either within the job or outside it?

Thanks in advance,
Jay

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Jay,

Good questions. In general, it is a safe assumption that the latest Linux kernels will place the threads on different processors. Older kernels sometimes had trouble with thread placement but those problems have long since been resolved.

The kernel ultimately controls scheduling of CPU resources. However, Intel's OpenMP implementation has two environment variablesto control thread behavior: KMP_LIBRARY and KMP_BLOCKTIME. Intel's OpenMP library has three modes: serial, throughput, and turnaround. In turnaround mode, the threads actively wait for more work rather than yielding the processors at the end of a parallel region. Turnaround mode is designed for dedicated systems. On a shared system, it can lead to poor resource utilization.

Throughput mode is the default. In throughput mode, the threads actively wait for more work but eventually yield the processors at the end of a parallel region. The KMP_BLOCKTIME variable controls how long the threads wait. It is set to 200 ms by default.

The compiler documentation has more complete descriptions of these environment variables in case you need more information.

Best regards,
Henry

Hi Henry,

Thanks for your reply. I have two follow-up questions just for clarification.

For the first question, my Linux kernel version is 2.4.19 - is it up-to-date enough not to cause the thread placement problem you mentioned?

For the second question, if Iexecute thethreads in turnaround mode, no thread is allowed to migrate to a different processor or get preempted by another thread during execution of the parallel region?

Thanks,
Jay

Hi Jay,

The Intel compiler supports Linux kernels 2.4.X so you're up-to-date. Your system should boot the SMP version of the kernel by default if you're using a multiprocessor or Hyper-Threading is enabled.

If you're running on a dedicated system with a 1:1 ratio of threads to processors, your threads are likely to remain in place and run without interruption. However, there's no guarantee because the operating system controls CPU resources. The Intel OpenMP library does not control thread migration,regardless of whether you're using throughput or turnaround mode. Nor can the library prevent preemption. The operating system is free to migrate or preemptthreads as it sees fit. Otherwise, a rogue application or library could commandeer the system.

Best regards,

Henry

Message Edited by hagabb on 06-28-2004 04:39 PM

Hello,

I have a Xeon(P4) that reports 4 CPUs running Linux kernel 2.4.21-4.ELsmp (RedHat EL 3.0). Job is compiled with ICC 8.1.021 using OpenMP.

I find that my single-threaded application (45 seconds runtime) is moved around between CPU0 and CPU2, from which I infer the OS is moving it around.

Running 2 threads (in aone-time executedparallel region, ie the flow is
ReadDataFiles => Execute2Threads => WriteOutputFiles), I find that if the
threads start up on CPU0 and CPU2 they tend to stay that way (at least the few iterations I ran) and the run-time is fast (30 sec). But sometimes they start up on CPU0 and CPU1, or they can startup on CPU1 and CPU2, then migrate to CPU2 and CPU3 -- in both these cases the run time is long (~40s), which I believe is indicative that the threads spend a lot of time on the same physical CPU.

I would welcome any comments, suggestions or anything that would help me get predictable fast performance.

Thanks,
-rajeev-
Rajeev Rohatgi

Yes, this is a very sore point, and accounts for many people finding the simplest way is to shut off HyperThreading in the BIOS. Some applications will run better with 4 threads, in such cases, as that at least forces the work to be spread out. Work has been done on making linux scheduler do the right things, but has not appeared in kernels in production distros. Also, processor affinity calls should be available in some newer kernels, but requiring those for simple cases like yours is unsatisfactory. A corrected scheduler would move a second thread quickly to an idle physical CPU.

Thanks Tim !

-rajeev-
Rajeev Rohatgi

Tim,

First off, thanks for your patience with some of my posts that veer off-topic.

But this BIOS disabling has got me wondering... how would I tell the difference from a Xeon(P3) and a Xeon(P4)-with-hyperthreading-disabled ? Right now I am able to do this by seeing /proc/cpuinfo whether 2 or 4 CPUs are reported... but with hyperthreading disabled presumably the P4 Xeon also reports 2 CPUs ?

Surely there's a utility somewhere that precisely identifies an Intel processor ?

Thanks,
-rajeev-
Rajeev Rohatgi

One of the more common reasons for wanting to distinguish between P-III and P4 is to determine whether SSE2 is supported. There is a flag accessible by the cpuid instruction for that purpose. This hasn't been so important since the Intel compilers incorporated command line switches which permit generation of both SSE and SSE2 code, with the cpuid stuff hidden from the programmer.

Leave a Comment

Please sign in to add a comment. Not a member? Join today