wrong vectorization with -O3 -openmp -parallel

wrong vectorization with -O3 -openmp -parallel

Hi everybody,

I have wrong results with the attached program depending upon the compiler options :

-O3 -openmp -parallel ===> wrong results
-O3 -openmp ===> wrong results
-O3 -parallel ===> results OK
-O2 -openmp -parallel ===> wrong results
-O2 -openmp ===> results OK
-O2 -parallel ===> results OK

The attached program is a simplified version (extracted from a much larger program) which does not use directly OpenMP
It seems that the problem come from incorrect vectorization of the second loop in subroutine sectub in module mod_tube
The problem can be avoided by inserting !DEC$ NOVECTOR in front of this loop. However the fact that -O2 -openmp -parallel gives wrong results is very dangerous and this kind of error is very difficult to detect in a large program.....!

My configuration : Fedora fc14 on x86_64 with last compiler version (update 2) :
>uname -a
Linux localhost.localdomain 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23 16:04:50 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
ifort -v
Version 12.0.2

Results :
>ifort -fpp -xHost -ipo -O3 -no-prec-div -openmp -parallel test.f90
ipo: remark #11001: performing single-file optimizations
ipo: remark #11006: generating object file /tmp/ipo_ifort8OY23W.o

>./a.out
run with 2 threads
wrong volt1(151) = 0.156332141435743D-04 should be 0.471704309244172D-04
wrong volt1(153) = 0.153276971692221D-04 should be 0.457167413104183D-04

Sorry for the size of the test program : this kind of error can be suppressed by rather minor changes in the code and it was not so easy to simplify the code whitout suppressing the error.

A few remarks :
1 - runing with only one thread (KMP_ALL_THREADS=1) does not suppress the error : this is logical because we are not using directly OpenMP
2 - variable SURFT1 which is computed just before VOLT1 is always correct so it is unlikely that we are taking a wrong branch in the set of if then elseif etc... in the main body of second loop in subroutine sectub
3 Suppression of the computation of SURFT1 in the loop suppress the error : therefore the content of the loop must be sufficiently heavy to trigger the wrong vectorization
4 More surprising : suppression of the module leccon by moving the declaration of the variables xx0,d0,etc... in module mod_tube and performing the initialization of these variables in the main programm suppress the error : therefore the wrong vectorization seems linked with the fact that in the second loop of sectub we are accessing variables which belong to another module...

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Thank you for the sample code, I know these are not always easy to derive.

I will investigate - I want to see if this is a regression from older compilers.

ron

As you don't have any OpenMP directives, I would ask whether -auto without -openmp also produces a problem. At first glance, I don't see any inadequately defined arrays where that should happen. I wonder why you didn't indicate whether -fp-model source could suppress the problem.

To Ron : yes this is a regression from older version of the compiler

To Timintel : -auto without openmp ===> results OK
-fp-model with precise strict or source ===> results OK but this is not surprising because, most likely, these
options inhibit some optimizations and suppress the wrong vectorization of the loop

this affects all versions of 12.0, including beta versions. It was not present in 11.1 and earlier.

And yes, it's that loop at line 98 that you identified. O3 is performing a loop transform here and vectorizing where it shouldn't OR doing some part of the transformed loop in vector mode. The phase is called the "HPO Vectorizer" if you're curious.

I will get a bug report started at highest priority. Bug ID DPD200165959

ron

Leave a Comment

Please sign in to add a comment. Not a member? Join today