Different results on different Xeons with same ifort 14 executable

Different results on different Xeons with same ifort 14 executable


we migrated our simulation code from ifort 11.1.056 to ifort 14.0.2. With the new exec compiled with ifort 14.0.2 we get different results on different Xeon CPUs for some of our test cases. We never encountered this for the old exec with ifort 11.1.

ifort, Linux RHEL 6.4 x64, statically linked.
Executable built on Xeon 5680, tests are OK. But some tests give different results when the same executable is executed on a Xeon E5-2670 or E5-2650 machine. The tests use only one OpenMP thread.

F95 Flags used to compile:
-g -nbs -convert big_endian -fp-model source -override-limits -I$(MKLROOT)/include -I$(MKLROOT)/include/intel64/lp64 -traceback -O -xW -c
Link flags:
-static-intel -traceback  -openmp -Bstatic -lmpich -lmpl $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_sequential.a $(MKLROOT)/lib/intel64/libmkl_core.a -Wl,--end-group
Identical sources and Makefiles (except for the MKL pathes, i.e. em64t instead of intel64) are used for compilation.

Adding flag –fimf-arch-consistency=true or using -fp-model precise does not help. Same deviations in results. The different results are also generated with the debug executable (-check bounds -check format -check uninit -check pointers -warn unused -fpe0 -ftrapuv -debug extend  instead of -O).

Running the tests with the ifort 11.1 exec always gives identical results on the different Xeons.

Any hint or help would be highly appreciated.

6 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

-xW option is obsolete but should have no effect, as it is equivalent to -msse2 (default for x86_64).

-fp-model source and precise are identical for ifort.  They do eliminate some optimizations where numerical results vary with aliignment, but don't affect MKL.  arch-consistency changes the math library linkage to a library which should not attempt to take advantage of newer CPU architectures, but it doesn't apply to MKL.

-align array32byte seems an easy way to avoid some of the alignment differences occurring when MKL shifts automatically into AVX code.  You could also check whether the difference is associated with MKL by linking in the old MKL (set MKLROOT to the older version).

You should look into the MKL consistency options if your desire is to sacrifice the potential gain of AVX in favor of closer numerical results.



 - I know that -xW is obsolete and substituted it with -msse2 in our production makefile. I just kept it for the comparison to eliminate any possible impact

- interesting to learn that -fp-model source and precise are identical. Premier support suggested to use precise instead of source.

- I will definitely try and re-use the old MKL. I thought about MKL as a possible cause myself but did not try to substitute the new with the old one because I thought this might only generate more trouble.

- I might not completely understand your last comment. Are you saying that the MKL routines do use AVX (when available) although I only specified -msse2? I thought using -msse2 would prevent to use any newer features like AVX. If I wanted to take advantage of these features, I thought I would need to set -mAVX as an option. Actually, I already tried this and naturally got a message that our old Xeon does not support this. If the MKL functions do use AVX then this could really be a potential source for the different results.

Thanks a lot for your fast feedback. I will report if switching back to the old MKL version has an impact.





MKL was the root cause of the deviations.

Setting MKL_CBWR=SSE2 on the E5 Xeon eliminates the deviations.

Thanks for the link to the slides about CNR.

MKL doesn't observe the compiler options as to whether it uses AVX; it senses the opportunity to use AVX, unless you set those MKL options.  As you saw, when taking advantage of AVX, roundoff may be slightly different.  If the differences are large, it doesn't mean the AVX is incorrect, but you might take it as a warning about the level of accuracy achieved by your algorithm and choice of precision.

If you wish to investigate for a gain associated with instruction set choice, either -msse4.1 or -msse4.2 would work on both the 56xx and newer CPUs.

imagem de Steve Lionel (Intel)

As Tim says, in Intel Fortran "precise" and "source" have the same meaning. They don't in Intel C++.


Faça login para deixar um comentário.