# Introduction to Parallel Programming for Shared Memory Parallelism (Intel)

## Introduction to Parallel Programming for Shared Memory Parallelism (Intel)

Hi,

Some of my students at USChave more questions on how Solution #3 on slide 25 on PPT Part4_ConfrontingRaceConditions works out. One of the major question iswhy tmpvariablebelow will notcause a race condition. Could you describe further how the code below works in parallel better than Solution#2, where are the threadscreated, where are they executed and why tmp usage doesnot cause a race condition?

Thanks

Professor Jose Villeta

EE-CSCI 452 Game Hardware Architectures

double area, pi, tmp, x;

int i, n;

...

area = 0.0;

#pragma omp parallel private(tmp)

{

tmp = 0.0;

#pragma omp for private (x)

for (i = 0; i < n; i++) {

x = (i + 0.5)/n;

tmp += 4.0/(1.0 + x*x);

}

#pragma omp critical

area += tmp;

}

pi = area / n;

Why is this better?

4 posts / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.

In solution #2, the full computation of the value for tmp is taken out of the critical region, so the time spent in the critical region is less than in Solution #1. It still requires that the critical region be entered and executed once per loop iteration.

In solution #3, the critical region is moved out of the loop. This is done by creating a parallel region (rather than a parallel for region as in #1 and #2) around the for-loop. Each thread, after computing the assigned loop iterations, will have a partial result in the (private) tmp variable. This is added into the global area variable in the critical region. In this case, the critical region is entered only once per thread. Thus, the overhead of entering a critical region once per iteration is eliminated and there should be less time with threads spent waiting repeatedly to enter the critical region.

In soluton #3, the threads are created at the #pragma omp parallel line. The private clause creates a local copy of the tmp variable for each thread and this local copy is used by the threads during the computations of the for-loop. The iterations of the loop are split up and assigned to threads via the #pragma omp for.

--clay

Clay - Thank you for providing that ecellent and clear expalnation for Professor Villeta's students.

Paul

Thanks Clay for the extra insight on this solution!

It's pretty clear now how tmp variable is being used during the parallel execution of the code.

Cheers

jose