Performance penalty - allocatable arrays within derived data types

Performance penalty - allocatable arrays within derived data types

Portrait de nitya

Hi,

I searched through this forum and was not able to get relevant answers so wanted to post a query.

I have a derived data type that has a number of allocatable arrays within it. My query is under what circumstances is it better to use an array of derived data types when compared to using allocatable arrays within a derived data type. For example, say

TYPE Var1

  REAL*8, ALLOCATABLE, DIMENSION(:) :: A

  REAL*8, ALLOCATABLE, DIMENSION(:) :: B

END TYPE Var1

  Call Calculateresult()

Subroutine Calculateresult()

  j = getjValue()

  result = Var1%A(j) + Var1%B(j)  where the value of j is not sequential.

End Subroutine Calculateresult

-------------------

OR

-------------------

TYPE Var2

  REAL*8 :: A

  REAL*8 :: B

END TYPE Var2

  TYPE(VAR2), ALLOCATABLE, DIMENSION(:) :: VarArray

  CALL Calculateresult(VarArray(j))

Subroutine Calculateresult(Var)

   result = Var%A + Var%B

End Subroutine Calculateresult

When would it be more efficient to use VarArray and Var2 instead of using Var1. Would the answer be any different, if Var1 and Var2 are themselves part of another derived data type.

If someone could point me to some literature that explains how nested derived data types are stored in memory, and what is an efficient way of using them, it would be great.

Thanks

Nitya

4 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de Steve Lionel (Intel)

This is dependent on how your application uses the data, so there is no universal answer. I can tell you that a derived type component that is allocatable consists of a descriptor for the object, the size of which depends on the number of dimensions and whether or not it is polymorphic. Nested derived types simply contain the storage for the derived types in the parent.

In the example you show, the second example would be more efficient as there is just one fetch through a descriptor rather than two. If you know the two values, A and B, for a given array index, are going to be used together, then an array of derived type is better than a derived type of arrays.

Steve
Portrait de nitya

Thanks for your help.

Portrait de jimdempseyatthecove

Also, for the example given, in the example 2, A and B are likely to reside in the same cache line. Meaning one memory read loads both variables into L1 cache. Your actual use may change this. However knowing this, should you have a large type (larger than one cache line), it may be beneficial to order the variables in the type to improve probability of cache line hits. This may require non-alphabetically ordered names.

Now then, when you have a program that runs sequentially through the indicies, (DO I=1,N) then organizing as the first example may yield better opportunities for vectorization (and faster code).

Choose the technique to meet the requirements

Jim Dempsey

www.quickthreadprogramming.com

Connectez-vous pour laisser un commentaire.