Issue in array operations in MPI application

Issue in array operations in MPI application

Bild des Benutzers Gleb D.

Hello, Intel team!

I develop an application using MPI technology and run it on one of the supercomputers of theRussianAcademyof Sciences. They have IFC v12.0.3 installed.  I may have found a bug in the compiler.

There is a real(4), allocatable, 2-dimensional array in my code (actually there are a lot of them, but only one causes problems). Depending on how many cores I use, the dimensions of the array vary from (81:102, 1:22) for 96 cores to (97:114, 1:18) for 150 cores.

First I set each element of this array individually (in a 2d-cycle), like this:   

evap (i,j) = 0.002 +real(j+2*i)/1000000.

and then, after the cycle, I divide the entire array by a constant:  

evap =  evap/(24.*60.*60.)

After the last operation I get different results for different numbers of cores, but only in a few elements of the array, located not too far from its bounds. And only one bit seems to be different – the least significant in the mantissa, so I get slightly different arrays.

The point is that if the division is performed individually, in another cycle, I get identical values of all of the elements, present in both versions of the array. And everything is fine as well, if the array equals to a constant before the array-wise division.

I don’t know if this is a bug, or a side effect from some kind of optimization (compilation flags are as follows:   -fpp2 -assume byterecl -module ./obj -free -fpe0 -check bounds -traceback), but this is really frustrating, since I strive to get identical results to test the correctness of my code.

Please, tell me where I am wrong, if so, and maybe how to fix this. Thank you in advance.

Respectfully, Gleb D.

4 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Tim Prince

If you want to avoid numerical issues associated with the compiler changing divide to multiply, or failing to observe parentheses, you will need options such as -assume protect_parens -prec-div -prec-sqrt.  Multiple assume options are like:  -assume byterecl,protect_parens.

The option -fp-model source includes the precautions I already mentioned, and also removes optimizations which are likely to produce numerical differences with varying data alignment, and sets IEEE gradual underflow.  I don't know whether current recommendation to use -fp-model source or precise, with those alternatives being the same for Fortran (but not for Intel C++).

Data alignments are quite difficult to control in 32-bit mode; you may prefer 64-bit Intel64 compilation.  Current ifort has options such as -align array32byte to improve default alignments.  I don't know whether this can control alignment of allocate, but I would agree that it's a buglet if allocate doesn't consistently produce 16-byte alignment.   You should be able to check this by use of the C_LOC intrinsic.

Different numerics at the ends of a loop used to be frequent in 32-bit mode due to compiler switching to x87 for the scalar remainder loops.  If something like this still happens, it could be a bug.

Bild des Benutzers Gleb D.

Compilation flag  "-fp-model source"  seems to resolve this problem. Now everything works great! Thanks. 

And also, is there an easy way to tell the compiler to turn off all of the possible optimizations, so it would work as predictable as it can?

Bild des Benutzers Steve Lionel (Intel)

You want to remove every possible optimization?  -O0 will do that, but you'll lose a lot. Using -fp-model source and the -align option as Tim suggests should get you most of what you want.

Steve

Melden Sie sich an, um einen Kommentar zu hinterlassen.