OMP_WAIT_POLICY

OMP_WAIT_POLICY

Hi there,

I'm testing the behaviour of the OMP_WAIT_POLICY environment variable with an application that performs several millions of OpenMP barriers. However, it takes the same time indepently from the "active" or "passive" value. I've studied an Amplifier trace and it's pretty similar, where the most expensive routine is __kmp_wait_sleep, and in second place __kmp_static_yield, even when OMP_WAIT_POLICY=active. It seems like the runtime is always running the passive policy.

I run in native mode, but I observed the same behavior running on the host (without using MIC).

I'm using ICC 13.0.0

Is there any issue?

Thanks!

Barcelona Supercomputing Center
publicaciones de 5 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Taylor Kidd (Intel)

Hi,

I am working on your issue. If I do not get back to you in a couple of days, please ping on me.

Regards
--
Taylor

Hi Diego,

You will likely see no difference in performance from OMP_WAIT_POLICY setting unless your system is oversubscribed. This setting is responsible for yielding vs. not yielding while threads are spinning.

We have one more control that deals exactly with spinning/sleeping of threads - KMP_BLOCKTIME. This setting is independent of OMP_WAIT_POLICY, and controls how many millisec a thread is spinning before it goes to sleep.

So if you set, for example, KMP_BLOCKTIME=0, then the behavior of OpenMP runtime will be "very passive", that is threads will go to sleep immediately on a barrier. If you just using OMP_WAIT_POLICY, then threads will yield or not yied, that does not affect the performance much (it does significantly affect the performance in case of oversubscription).

Regards,
Andrey

Thank you Andrey,

I’ve run some tests using KMP_BLOCKTIME=70s and OMP_WAIT_POLICY=active without any difference in time. The most time-consuming routines are still __kmp_wait_sleep and __kmp_static_yield, which sounds weird taking into account the configuration of the environment variables.
I’m basically trying to measure the impact of the barrier algorithm. If those routines are considered as part of the barrier, then it’s ok.

Regards.

Barcelona Supercomputing Center

Hi Diego,

Yes these routines are part of the barrier in our OpenMP runtime, and by setting KMP_BLOCKTIME=70s you asked the OpenMP threads to spend more time (comparing to default 0.2s) actively spinning in the __kmp_wait_sleep if there is no work for them. To exclude these routines from hot spot you can try setting KMP_BLOCKTIME=0, as I suggested earlier. And again, the impact of OMP_WAIT_POLICY will be negligible if you don't oversubscibe the machine. E.g. if you have 16 processors, try to launch 128 OpenMP threads (many threads per processor), and then you should definitely see the impact of OMP_WAIT_POLICY setting. If you launch 16 or less threads on such machine then the wait policy will unlikely impact the performance of the application.

Regards,
Andrey

Inicie sesión para dejar un comentario.