Fortran user defined types decrease the performance?

Fortran user defined types decrease the performance?

Hi,
I have a code piece which uses Fortran user-defined types quite a lot. I find that having "%" sign in the inner loops can significantly increase the memory loads and decrease the performance. But I don't know why. I would truly appreciate your help for helping me find the reason. Here are the two short versions of the code:
Original Version:

  1.   subroutine ARK2(region)
  2.  
  3.     ! ... Incoming variables
  4.     type(t_region), pointer :: region  
  5.  
  6.     ! ... local variables
  7.     integer :: rkStep, i, j, k, ng, ARK2_nStages, ImplicitFlag
  8.     type(t_mixt), pointer :: state  
  9.     state => region%state
  10.  
  11.     do i = 1, region%grid%nCells
  12.       state%time(i) = state%timeOld(i) + state%dt(i)
  13.     end do
  14.   end subroutine ARK2

Optimized Version:

  1. subroutine ARK2(region)
  2.  
  3.     ! ... Incoming variables
  4.     type(t_region), pointer :: region
  5.  
  6.     ! ... local variables
  7.     integer :: rkStep, i, j, k, ng, ARK2_nStages, ImplicitFlag
  8.     type(t_mixt), pointer :: state  
  9.     state => region%state
  10.     real(kind=8), pointer :: time(:), timeOld(:), dt(:)
  11.  
  12.     ! ... dereference pointers
  13.     time => state%time
  14.     timeOld => state%timeOld
  15.     dt => state%dt
  16.  
  17.     do i = 1, region%grid%nCells
  18.        time(i) = timeOld(i) + dt(i)
  19.     end do
  20.   end subroutine ARK2

​Please note that the only change I made in the optimized version is defining some local pointers to remove the "%" sign in the inner loop. What surprised me is that the original version has two times more load instructions executed (measured by using TAU and PAPI) than the optimized one. And the optimized one can run two times faster than the original one. I am very curious about how this happened and how the compiler will deal with the "%" sign in Fortran user defined types. Many thanks for your suggestions!

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

With pointers, the compiler has to do an indirect reference each time you access a component, as it doesn't know if the pointer may be aliased.

Retired 12/31/2016

Quote:

Steve Lionel (Intel) wrote:

With pointers, the compiler has to do an indirect reference each time you access a component, as it doesn't know if the pointer may be aliased.

Hi Steve,

Thank you so much for your reply. But could you please explain with more details? I think both of state%time and time are pointers. Then why the "%" sign can lead to more load instructions and slow down the code?

Many thanks!

Can you show the declaration of type t_mixt?

Retired 12/31/2016

Quote:

Steve Lionel (Intel) wrote:

Can you show the declaration of type t_mixt?

Hi Steve,

Thanks for your reply. Here is the declaration of type t_mixt:

  1.   TYPE t_mixt
  2.     INTEGER :: NVARS, ND, numproc, myrank
  3.     REAL(RFREAL) :: RE, REinv, PR, PRinv, SC, SCinv, xshock
  4.     REAL(RFREAL) :: DTAU, DTAUold, InitialRHS, CurrentRHSMax, AveHtFluxOld
  5.     REAL(RFREAL), POINTER :: time(:), dt(:), timeOld(:), cfl(:)
  6.     REAL(RFREAL), POINTER :: cv(:,:), cvOld(:,:), dv(:,:), tv(:,:), gv(:,:), cvOld2(:,:), cvOld1(:,:)
  7.     REAL(RFREAL), POINTER :: rhs(:,:), rk_rhs(:,:,:), cvTarget(:,:), cvTargetOld(:,:), RHSz(:,:)
  8.     REAL(RFREAL), POINTER :: VelGrad1st(:,:), TempGrad1st(:,:), MagStrnRt(:), tvCor(:,:)
  9.     REAL(RFREAL), POINTER :: flux(:), dflux(:), muT(:), PrT(:), SGS_KE(:)
  10.     REAL(RFREAL) :: MaxHyperMu, MaxHyperBeta, MaxHyperKappa ! ... maximum hyperviscosity at each time step
  11.     REAL(RFREAL), POINTER :: auxVars(:,:), auxVarsOld(:,:)
  12.     REAL(RFREAL), POINTER :: rhs_AuxVars(:,:), rk_rhs_AuxVars(:,:,:)
  13.     REAL(RFREAL), POINTER :: levelSet(:), levelSetOld(:), rhs_levelSet(:), rk_rhs_levelSet(:,:)
  14.     REAL(RFREAL), POINTER :: precond(:,:), auxVarsTarget(:,:), rhs_explicit(:,:,:), rhs_implicit(:,:,:)
  15.     REAL(RFREAL), POINTER :: rhs_auxVars_explicit(:,:,:), rhs_auxVars_implicit(:,:,:)
  16.     ! - Finite Volume
  17.     TYPE(t_fvsweep), pointer :: sweep(:)
  18.     ! - IO buffers
  19.     REAL(RFREAL), POINTER :: DBUF_IO(:,:,:,:)
  20.     INTEGER, POINTER :: IBUF_IO(:,:,:)
  21.     ! - Adjoint N-S
  22.     REAL(RFREAL), POINTER :: av(:,:), avOld(:,:)
  23.     REAL(RFREAL), POINTER :: avTarget(:,:), avTargetOld(:,:)
  24.     REAL(RFREAL), POINTER :: cvNew(:,:)
  25.     INTEGER :: num_cvFiles, numCur_cvFile
  26.     INTEGER, POINTER :: iter_cvFiles(:)
  27.     REAL(RFREAL), POINTER :: time_cvFiles(:)
  28.     ! ... post-processing
  29.     REAL(RFREAL), POINTER :: pp(:,:), Vort(:,:), Dilat(:)
  30.     ! ... spline coefficients for EOS and transport variables
  31.     TYPE(t_spline), POINTER :: dvSpline(:)   ! Cv(T), Cp(T), Gamma(T), Z(T), T(e_int) (in that order)
  32.     TYPE(t_spline), POINTER :: tvSpline      ! mu(T), lambda(T), k(T)
  33.   END TYPE t_mixt

Please note RFREAL is simply kind=8 for double precision. Thanks!

Thanks - since time is itself a pointer, you have double-indirection in the initial case. The compiler can do a decent job of optimizing single-level references to pointers, but double-level indirections are more complex.

Do you need to use POINTER for all of these? It seems to me that ALLOCATABLEs might work better for many if not all of the cases where you use POINTER. The compiler can deal better with ALLOCATABLE, though you still have the double-indirection issue.

Retired 12/31/2016

Quote:

Steve Lionel (Intel) wrote:

Thanks - since time is itself a pointer, you have double-indirection in the initial case. The compiler can do a decent job of optimizing single-level references to pointers, but double-level indirections are more complex.

Do you need to use POINTER for all of these? It seems to me that ALLOCATABLEs might work better for many if not all of the cases where you use POINTER. The compiler can deal better with ALLOCATABLE, though you still have the double-indirection issue.

Hi Steve,

Thank you so much for your reply. Just to clarify:
By saying double-indirection, you mean: The first indirection is we need to load address from state and then use the address to load state%time; The second indirection is we need to load address from state%time (since time itself is a pointer) and then to load the floating point numbers.

In the optimized version, there is only one such indirection (loading address from time and then loading the floating point numbers). This is the reason why the original version has more load instructions and a low speed.

Am I correct?

Thanks!

Best Reply

Exactly. It isn't the use of % itself that is slow - it isn't. But you have double the memory references in the first case and this probably also interferes with other optimizations.

Retired 12/31/2016

Leave a Comment

Please sign in to add a comment. Not a member? Join today