We have a large serial F90/F95/etc code and are trying to use OpenMP to improve performance. We have
chosen one loop and have introduced the OpenMP pragmas. The speedup for the loop is excellent
comparing 1 thread to several threads. HOWEVER, we are seeing a rather large performance hit (10-20%)
when we compare serial performance to using 1 thread with OpenMP.
We only have 15 scalar variables that are "private".
We have read that when using ifort (for some older version), the -openmp flag also turns on
the -automatic flag, which in turns cause the local variables to no longer be statically allocated.
Is it possible that this could be causing such a large performance degradation?
Is this a known condition? Any ideas of things to try?