[OpenMP] Threadprivate outside main program and copyin clause

[OpenMP] Threadprivate outside main program and copyin clause

Dear all,

I have an issue with a Fortran code that I am trying to parallelize using OpenMP. This is a code for Monte Carlo particle transport and it has several COMMON blocks that I marked as private using the THREADPRIVATE directive. A (very very very small) example of this code could be:

PROGRAM TUTOR2
IMPLICIT NONE

[ VARIABLES DEFINITIONS, COMMON BLOCKS, ETC]

C$OMP PARALLEL DO
     DO I=1,NCASE
           CALL SHOWER([ARGS])
     END DO
C$OMP END PARALLEL DO

END PROGRAM TUTOR2

SUBROUTINE SHOWER
IMPLICIT NONE

[ VARIABLES DEFINITIONS, COMMON BLOCKS, ETC ]

C$OMP THREADPRIVATE(/RANDOMM/)

[ A LOT OF CODE ]

RETURN
END
 

Well, as you can see the COMMON block /RANDOMM/ is first declared inside a SUBROUTINE (in this case SHOWER) and not the main program. This happens with a lot of COMMON blocks with this code. The variables inside RANDOMM block are initialized inside another subroutine before the PARALLEL DO directive. Therefore, I need to copy the values of RANDOMM block from the master thread to the slave threads, but as this block is first declared inside a subroutine I cannot use the COPYIN clause.

Is there a better approach to this problem than declaring all the COMMON blocks needed as private in the main program?. Thanks for your help!. 

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Place the COMMON /RANDOM/ ..., and C$OMP THREADPRIVATE(/RANDOM/) into a module, then USE that module instead of declaring the common in the subroutine (and elsewhere).

C mod_RANDOM.F
module mod_RANDOM
COMMON /RANDOM/ A,B,C,..
C$OMP THREADPRIVATE(/RANDOM/)
end module mod_RANDOM
--------------------------

subroutine FOO
  USE mod_RANDOM
  IMPLICIT NONE
C Comment out old COMMON
C COMMON /RANDOM/ A, B, C

Jim Dempsey

www.quickthreadprogramming.com

Thanks Jim for the tip, but the problem is that I need then to use COPYIN OpenMP clause to copy the master's value to the slave threads. I cannot use then the COPYIN clause with the /RANDOMM/ block declared inside the module. I could list all the variables inside the block manually, but as this is not the only COMMON block that I must handle with COPYIN this approach becomes soon impractical...

Jim's approach works if you use mod_RANDOM in your main program and declare all the variables you need to copy from /randomm/ in the copyin clause for your parallel region.  I assume you object to that because of the number of variables that would need to be in the copyin clause?

If I was writing this in modern Fortran then I would put the variables in the common block in a user defined type and pass an instance of that type where needed as a procedure argument,  For the parallel region, I would make the instance firstprivate.  A simple example follows:

    module randomm_mod
        implicit none

        type randomm_t
            real :: a, b, c
        end type randomm_t
    end module randomm_mod

    program OMP_Test
    use omp_lib
    use randomm_mod

    implicit none

    type(randomm_t) :: randomm

    call omp_set_num_threads(4)
    call set_randomm(randomm)

    !$omp parallel firstprivate(randomm)
    call use_randomm(randomm)
    !$omp end parallel

    print*, randomm%a, randomm%b, randomm%c
    end program OMP_Test

    subroutine set_randomm(this)
        use randomm_mod
        type(randomm_t), intent(out) :: this

        this%a = 1
        this%b = 2
        this%c = 3
    end subroutine set_randomm

    subroutine use_randomm(this)
        use randomm_mod
        use omp_lib

        type(randomm_t), intent(inout) :: this

        this%a = this%a + omp_get_thread_num()
        print*, this%a, this%b, this%c
    end subroutine use_randomm

 

Yes, my comply is that I would have to write all the variables from RANDOMM (and other COMMON blocks) that need the COPYIN clause explicitly, instead of just writing the COMMON block name, but OK, I can analyze which variables inside the COMMON blocks need the COPYIN clause and just list them. Although the COPYIN clause would be quite big, the rest of the code will look much nicer without so many COMMON block declarations...

thanks for the tips!

 

A variation on MarkLewy's suggestion where you need only a few items is to use the large user defined type container as he outlines. But then declare additional types with the various sub sets of variables. To this add an operator(=) function that takes on the right hand side the full data set. (I will let you design the sub type set and operator function)

!$omp parallel private(subset)
subset = fullset
!$omp do
do i=1, ...
  x = subset%memberVar...

Jim Dempsey

www.quickthreadprogramming.com

Thanks for all the comments. As a first step I used the module approach + "COPYIN" the needed variables. It took like 5 lines of code to do the last, does not look pretty but it works... now I have a fully operational MC code parallelized with OpenMP!... I must do further analysis to check that everything is OK and improve more the performance (by now I am obtaining 3x speed-up with a Core i5-3317U), but at least the results are consistent with the original code. Thanks for your help!

PS: Jim, I founded the "Chronicles of Phi", now reading...

3x speedup is great. Don't worry about not looking pretty. The Chronicles of Phi are all about not being pretty, and all about getting performance.

The techniques in that article apply not only to Xeon Phi, but also to host processors with Hyper Threading. For that series of articles my system has two Xeon Phi's though one was used for the article. The host was a 1P system: E5-2620 v2 (6 core, 12 threads). I am having a colleague run some test on 2P and 4P systems. The preliminary results are significantly better than I expected. When I get the data together I will add it as a supplemental to the blog series.

Jim Dempsey

www.quickthreadprogramming.com

Leave a Comment

Please sign in to add a comment. Not a member? Join today