I am having problems with an MPI code using Intel MKL and ifort (Composer version: 18.104.22.168). Each processor has exactly the same matrix, and they should be able to perform some sequential operations. Each processor is expected o obtain exactly the same values, since they are using the same binaries, same libraries and each node is in fact identical (2 Sandy Bridge EP E5-2670 processors in each node). However, routines as CGEMM and CGESVD produce slightly different values in each processor, a variantion of the order of 1e-6~1e-8. This does not always happen, and it seem to depend on the number of processors being used.
Is this behaviour expected at all? The difference is below the machine precision (considering single precision) but aren't the individual cores suppose to perform the roundoffs in the same manner? If this behaviour is not expected I could provide some example matrices.
Thanks in advance