OpenMP Stack "automatic" and "auto-scalar"

OpenMP Stack "automatic" and "auto-scalar"

We have a large serial F90/F95/etc code and are trying to use OpenMP to improve performance. We have
chosen one loop and have introduced the OpenMP pragmas. The speedup for the loop is excellent
comparing 1 thread to several threads. HOWEVER, we are seeing a rather large performance hit (10-20%)
when we compare serial performance to using 1 thread with OpenMP.

We only have 15 scalar variables that are "private".

We have read that when using ifort (for some older version), the -openmp flag also turns on
the -automatic flag, which in turns cause the local variables to no longer be statically allocated.
Is it possible that this could be causing such a large performance degradation?

Is this a known condition? Any ideas of things to try?

Linda
7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Steve Lionel (Intel)'s picture

It's true that enabling OpenMP implies -automatic. I am not aware that this would cause a performance degradation of that degree - it is usually at the "noise" level. On the other hand, just adding OpenMP overhead to the mix and restricting it to one thread will very likely have a noticeable effect, though it would be unusual to be that large. Do you expect the program to run in that environment?

Steve

I have seen ifort "choke" on an excessive number of OpenMP private variables. Whether the number you mention is "excessive" may depend on context, and whether you are compiling for 32- or 64-bit mode.
As an example, the SPECfp CactusADM application runs better with auto-parallel than with ifort -Qopenmp, as a large number of variables which the compiler could optimize away when not using OpenMP have to be specified as OpenMP private. So it's well worth while to re-write the source to reduce the number of private variables.
If your application is flaky in a manner which is affected by -Qauto, it's worth while to check it out with -Qauto before going to -Qopenmp.

"flaky in a manner which is affected by -Qauto"

How does one define or detect that? Flaky - takes longer> Flaky - gives different answers? Flaky - some other way?

Linda

The primary effect of -Qauto, with no other option changes, is to remove effective SAVE status from local arrays. Making the procedure RECURSIVE should have similar effect. If you have incorrectly initialized local arrays, or, possibly, data over-runs, those will have to be corrected to make it work reliably with -Qauto or RECURSIVE, along the way toward making it capable of working in OpenMP. If SAVE is required, it should be applied explicitly. Such arrays would have to be shared arrays under OpenMP, presenting difficulties in procedures inside parallel regions.

That's how we've been treating the code -- that if you need a SAVE, it needs to be explicit. Intel compiler has been pretty good at pointing out (uninitialized at points through) those problems.

I recompiled with qauto and have 2 out of 400+ test files that fail. I am investigating.

Linda

FYI, tracked both file problems down to initialization. Not anything to do with qauto

Linda

Login to leave a comment.