We usually read that parallel computation (either OpenMP or MPI) is not deterministic. This is usually easy to visualize in I/O. However, I am comparing my Serial and Parallel version (MPI) of the same code and the difference is close to 0.3% in the error. The error is defined as the maximum difference between two subsequent time steps. Thus, the difference between the error of the serial version and the MPI is close to 0.3 %. Although the error is small, the solution presents slight differences, 1% in certain regions of the computation. When I increase the number of processes then that difference is more notorious. Again, that difference is still small.
I was wondering about the cause of this problem. Also, I would like to ask two more questions. I was thinking that maybe the problem comes because I am using -O3 flags. I have heard that it is usually not recommended to use optimization with MPI... but I have not found anything consistent to back up my decision.
As far as good practice, can we change MPI_REDUCE + MPI_BROADCAST by just a MPI_ALLREDUCE ? Again, I have heard that it is not good to change those for one MPI_ALLREDUCE, but it seems that the literature does not prove it directly. I am using the MPI_REDUCE + MPI_BROADCAST to use the same time step in the whole computation.
Finally, as far as 2D and 1D domain decomposition is there any rule that allows to decide between these two approaches? My problem is 2D but I have seen other people using 2D decomposition and they end up with ghost cells in four directions, i.e: Up, down, left and right. I only have left and right neighbours.
Thanks you all!!!