Advice on OpenMP code please

Advice on OpenMP code please

I am seeking advice about some OpenMP code I have written. (THis is my first foray into OpenMP programming.) I would like to know if (a) It is legal code (It currently compiles and gives the expected answers) and (b) If my handling of the 4 threads is optimal.

I have a data array define as

real, allocatable :: d(:,:)

in a module; then created as:

allocate d(d(n,n)) later, and actually used as an array of length len x len

As it happens, the most time consuming part of the calculation can be run as 4 parallel threads which take identical times (to within a few ms) and use the same amount of memory. Each thread produces just 3 integers as its final result and these are stored in an array howmany(15). (The first 3 elements of howmany() hold other data.) I have written the following code to handle this situation for computers with >=4 , 3, 2 and 1 core(s).

! Allow program to modify the number of threads              

	      call omp_set_dynamic(.true.)

	! How many cores to use?     

	      i = omp_get_max_threads()

	      NumThreads = min0(4, i)

	        if(len <= 50) NumThreads = 1 ! <= 50 not worth the overheads

	      call omp_set_num_threads(NumThreads)

	    ! Now the parallel code

	      if(NumThreads == 1) then

	      ! No multicore processor available, so 1 thread at a time.

	      call count_clusters_OMP(1,d,n,len,lower,upper, .true.,NumThreads,ierr)

	      call count_clusters_OMP(2,d,n,len,lower,upper, .true.,NumThreads,ierr)

	      call count_clusters_OMP(3,d,n,len,lower,upper, .true.,NumThreads,ierr)

	      call count_clusters_OMP(4,d,n,len,lower,upper, .true.,NumThreads,ierr)    

	      ! 2 cores

	      else if(NumThreads == 2 .or. NumThreads == 3) then

	      ! 1st 2 calculations

	      !$OMP PARALLEL SECTIONS COPYIN(d, howmany, maxval, minval, n, len)

	      !$OMP SECTION

	      call count_clusters_OMP(1,d,n,len,lower,upper,.true.,NumThreads,ierr)

	      !$OMP SECTION

	      call count_clusters_OMP(2,d,n,len,lower,upper,.false.,NumThreads,ierr)

	      !$OMP END PARALLEL SECTIONS

	      ! Second set

	      !$OMP PARALLEL SECTIONS COPYIN(d, howmany, maxval, minval, n, len)      

	      !$OMP SECTION

	      call count_clusters_OMP(3,d,n,len,lower,upper,.true.,NumThreads,ierr)

	      !$OMP SECTION

	      call count_clusters_OMP(4,d,n,len,lower,upper,.false.,NumThreads,ierr)

	      !$OMP END PARALLEL SECTIONS     

	      ! 4 cores (or more)

	      else if(NumThreads == 4) then

	      !$OMP PARALLEL SECTIONS COPYIN(d, howmany, maxval, minval, n, len)

	      !$OMP SECTION  

	      call count_clusters_OMP(1,d,n,len,lower,upper,.true.,NumThreads,ierr)  

	      !$OMP SECTION

	      call count_clusters_OMP(2,d,n,len,lower,upper,.false.,NumThreads,ierr)

	      !$OMP SECTION

	      call count_clusters_OMP(3,d,n,len,lower,upper,.false.,NumThreads,ierr)

	      !$OMP SECTION

	      call count_clusters_OMP(4,d,n,len,lower,upper,.false.,NumThreads,ierr)

	      !$OMP END PARALLEL SECTIONS

	      endif

(lower, upper, minval and maxval are scalars.)

A colleague of mine has criticised this code on 2 grounds:

  1. It is not legal Fortran.
  2. I should not manage the threads as I do but let the software do this.

Can I solicit an expert opinion please?

With thanks

Chris G

6 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

If d is a read only (input) array .OR. if the first argument of count_clusters_OMP partitions d into independent sections then d need not be copied in (and need not be private). Please indicate what variables get modified. Are these any of the arguments and/or are any global variables?

You can vary the thread count with static (non-dynamic) scheduling as well. Dynamic alters how partitions are generated and is typically used when equal partitions do not yield equal work.

Jim Dempsey

www.quickthreadprogramming.com

Thank you Jim.

d is modified by each thread. Nothing else gets modified e.g. the variables maxval, minval, n, len are not changed.

The array howmany(15) gets elements changed by each thread. If j is the thread number (j = 1, 4) then

howmany((j-1)*3 +4), howmany((j-1)*3 +5), howmany((j-1)*3 +6) are writtten by thread j.

The equal partitions I have created do yield almost exactly equal work.

ChrisG

Does each (any) thread use for input a howmany element that resides in the write domain of a different thread

X(j) = fn(X(j-1)) or fn(X(j+1)) where the read reference is within the write reference of a different thread?

If NOT then d need not be copied, if SO then a copy of d may or may not resolve the issue as there may be temporal issues with respect to order of execution. A closer examination of the code would be required to construct a working parallel solution.

Jim Dempsey

www.quickthreadprogramming.com

Is my use of copyin legal? The Intel documentation says:

"Parallel Directive Clause: Specifies that the data in the master thread of the team is to be copied to the thread private copies of the common block at the beginning of the parallel region.

COPYIN (list)

list

Is the name of one or more variables or common blocks that are accessible to the scoping unit. Subobjects cannot be specified. Each name must be separated by a comma, and a named common block must appear between slashes (/ /).

The COPYIN clause applies only to common blocks declared as THREADPRIVATE.

You do not need to specify the whole THREADPRIVATE common block, you can specify named variables within the common block"

I have not used common blocks or threadprivate, yet the code works!

Would it be better to make my own copies of the d array (say d2(), d3()... ) and use these explicitly in the calls to count_clusters_OMP() as needed?

ChrisG

 

You stated that independent sections of d are modified only by one thread .AND. you also indicated that sections of d do not have time based dependencies with respect to other sections of d. Therefore d can be, and should be modified directly by each thread.

You can get in trouble by copying data when it should not be copied. Pseudo code

Thread 0                       Thread 1
copy all of d,                 copy all of d
modify half of d,            modify other half of d
restore copy of d to d,  restore copy of d to d

In the above case, you might only see the last thread's update (of copy of d).

Jim Dempsey

www.quickthreadprogramming.com

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen