why?
code1
#include "stdafx.h"
#include "omp.h"
#define N 100000
int _tmain(int argc, _TCHAR* argv[])
{
int arx[N],ary[N];
int i,max_num_x=-1,max_num_y=-1;
for(i=0;i {
arx[i]=i;
ary[i]=N-i;
}
#pragma omp parallel for
for(i=0;i {
//#pragma omp critical(max_arx)
if(arx[i]>max_num_x)
max_num_x=arx[i];
//#pragma omp critical(max_ary)
if(ary[i]>max_num_y)
max_num_y=ary[i];
}

printf("max_num_x=%d max_num_y=%d\n",max_num_x,max_num_y);
return 0;
}

and
code2
#include "stdafx.h"
#include "omp.h"
#define N 100000
int _tmain(int argc, _TCHAR* argv[])
{
int arx[N],ary[N];
int i,max_num_x=-1,max_num_y=-1;
for(i=0;i {
arx[i]=i;
ary[i]=N-i;
}
#pragma omp parallel for
for(i=0;i {
#pragma omp critical(max_arx)
if(arx[i]>max_num_x)
max_num_x=arx[i];
#pragma omp critical(max_ary)
if(ary[i]>max_num_y)
max_num_y=ary[i];
}

printf("max_num_x=%d max_num_y=%d\n",max_num_x,max_num_y);
return 0;
}

please tell me why the results of the two codes are identical? I don't know why no add #pragma omp critical ,no data race too,in code1.

5 posts / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.

It is possible that your compiler may choose atomic operations, even though you don't specify them, as ICL would do when you allow vectorization, or may optimize the loops away, as gcc would do. I am assuming there is no special implication to the use of a Microsoft C-like language, other than that you exclude the use of a standard compiler.

Asside from the issue that unless your system has more than 10 cores (hardware threads), you shouldn'trequest more threads than are available.

The parallel loop will divide up the range into number of threads chunks, in this case 10. The 1st thread into the loop gets 0:N/10, 2nd N/10+1:(N/10)*2, ....

The moment the 1st thread finds any element in ary[i], and inserts its max value, then all other threads (actually all threads in this case) will never find any other max for ary[i].

The moment the last thread finds the 1st element in its subsection for arx[i] it will be a new max, then all other threads will never find any other max for arx[i]. From then on, only the last thread will find a new max for arx[i] on each subsequent iteration.

Therefore, only if one of your threads gets evicted (preempted) after finding a local max, but before setting the found max value, and if the eviction lasts longer than the run time for either the 1st or last thread as the case may be, will you then observe the incorrect result.

Jim Dempsey