O2 optimization flag is the fastest

O2 optimization flag is the fastest

I have noticed through rigorous testing, for a parallel cfd code that the fastest code is generated with the following options:

-i4 -r8 -O2 -fp -model precise

anything else added such as -xhost, or O3,  (O0 and O1 are four times slower than the chosen flags) results in slower code.

The -fp-model precise is a must  for these type of software, and additionally there are numerous intrinsic matmul functions in the code (max size 70x70) and even intel math kernel library calls for the DGEMM or others results in slower code. Additionally the code in unstructured which means that there is a lot of indirect memory access. 

The same pattern has been noticed since intel fortran compiler version 7 and onwards when this code was developed.

Thank you in advance for any comments suggestions.


3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

As you weren't specific why you would set -fp-model precise, I'll point out that options such as -assume protect_parens -prec-div -prec-sqrt usually give you as correct numerical results with better performance.
It's difficult to comment about -xHost when you don't say which architecture that translates to on your platform. In my experience, -xSSE4.1 (and sometimes -mSSE2) was often better performing than -xHost on Westmere platform. I'm aware of reported issues about -xAVX in some unusual (and not fully described) circumstances.
If -O3 is hurting your matmul performance, you should consider trying it with -no-opt-matmul. The indexing of this option in the help file is under MATMUL, not under the option list, and the effect of -O3 in turning it on isn't described there. -O3 in-line optimization of matmul without the opt-matmul substitution of the MKL library call can be quite effective.
On the other hand, if your matmul problems are large enough to benefit from MKL threading under opt-matmul, you will likely need appropriate KMP_AFFINITY settings if running on a multi-CPU platform. Of course, if you use OpenMP explicitly, affinity and proper use of HT is important.
It's difficult to believe you haven't seen changes in the treatment of matmul between ifort 7.1 and recent compilers which support the -xHost and opt-matmul options.

I am running a 3D CFD code. I have tried the recommended options given by the author (excluding -i4 -r8 due to some incompatibility) but there's not much difference. Maybe I'll try the new settings and see how it goes. Nevertheless, thanks for the recommendations.

Leave a Comment

Please sign in to add a comment. Not a member? Join today