Different math results in openMP depended on number of processes

Different math results in openMP depended on number of processes

Hi,

I've encoutera very strange issue in openMP.

I wrote (with help of tutorials) a simple program which computesPi (and sums up a numbers from0 to N):

#include
#include
#include
#define PI25DT 3.141592653589793238462643

static long nbins = 1<<30;

int main (int argc, char *argv[])
{
long int i;
double x, pi, sum = 0.0, sum2 = 0.0, bin;
if (argc>1)nbins = atoi(argv[1]);
bin = 1.0/(double) nbins;
#pragma omp parallel for reduction(+:sum,sum2) private(x)
for (i=1;i<= nbins; i++)
{
x = (i-0.5)*bin;
sum += 4.0/(1.0+x*x);
sum2 += i;
}

bin * sum;
printf("%10d steps %.15f diff: %.4g
",nbins,pi,fabs(pi-PI25DT));
return 0;
}

It gives surprising different result depending

1) on the method used in omp parallel for pragma (static|dynamic|guided)

2) on the number of threads which are used (set by

set OMP_NUM_THREADS=1|2|3|4)

OMP_NUM_THREADS=2

C: oastpi>pi-omp.exe 1000000
Static (default) explicit schedule:
Int test sum 500000500000.000000
1000000 steps 3.141592653589916 diff: 1.23e-013
Dynamic explicit schedule:
Int test sum 500000500000.000000
1000000 steps 3.141592653589932 diff: 1.39e-013
Guided explicit schedule:
Int test sum 278328760174.000000
1000000 steps 3.141592653589938 diff: 1.448e-013

OMP_NUM_THREADS=3

C: oastpi>pi-omp.exe 1000000
Static (default) explicit schedule:
Int test sum 500000500000.000000
1000000 steps 3.141592653589883 diff: 8.971e-014)
Dynamic explicit schedule:
Int test sum 500000500000.000000
1000000 steps 3.141592653589906 diff: 1.132e-013
Guided explicit schedule:
Int test sum 173337045946.000000
1000000 steps 3.141592653589877 diff: 8.349e-014

OMP_NUM_THREADS=4

C: oastpi>pi-omp.exe 1000000
Static (default) explicit schedule:
Int test sum 500000500000.000000
1000000 steps 3.141592653589878 diff: 8.482e-014
Dynamic explicit schedule:
Int test sum 500000500000.000000
1000000 steps 3.141592653589903 diff: 1.097e-013
Guided explicit schedule:
Int test sum 169875681895.000000
1000000 steps 3.141592653589882 diff: 8.882e-014

Diff gives the difference between comptued and manually defined Pi. For higher number of steps it gives even worse differences. The computer has physically 4 cores. Will someone explain this behaviorto me, please? I had always thought that the parallel for computes over all theiterations and that computer precision error should be the same or at least very similar. I am quite shocked that even a sum of int numbers gives different result...

Why?

Thank you,

qutie confused Martin

2 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

The effect you observe has nothing to do with threading as such.

If you sum up floating point numbers then the result can depend heavily on the exact sequence of operations. When summing up an array of such diverging numbers it is a common practice to sort the array by ascending magnitude and then sum it up to minimize roundoff errors.

The different scheduling for the reduction in the program of course changes the sequence of operations.

I recommend http://docs.sun.com/source/806-3568/ncg_goldberg.htmlfor a much better explanation than I could give.

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!