Derived Data Types Vs. Standard Arrays

Derived Data Types Vs. Standard Arrays

This maybe purely case based, but what I am dealing with is a code that was originally written in Fortran 77 that makes use of Double Precision Arrays for storing a variety of information about a specific variable and uses IDINT to extract integers from. This design unfortunately makes it so that it must have a lot of unnecessary loops to find location indices.

For example

DOUBLE PRECISION,DIMENSION(100):: A        !CONTAINS ID INFORMATION, IT IS A LITTLE MORE COMPLEX, BUT SIMPLIFIED FOR THIS EXAMPLE
DOUBLE PRECISION,DIMENSION(7,5000):: B   !ROW INDEX HOLDS INFORMATION
!MEANING OF ROW INDEX OF B for the Jth item
ROW=IDINT(B(1,J))
COL=IDINT(B(2,J))
LAY=IDINT(B(3,J))
ID1=IDINT(B(4,J))   !LINKING ID TO OTHER PARTS OF CODE
ID2=IDINT(B(5,J))   !LINKING ID TO OTHER PARTS OF CODE
VAL1=B(6,J)
VAL2=B(7,J)
VAL3=B(8,J)
!THERE ARE A LOT OF SEARCHES THAT MATCH THE ID LOCATION OF A WITH B TO PULL VAL1, VAL2, and VAL3 into other parts of the code
DO I=1,100
IF (IDINT(A(I))==IDINT(B(4,I)))THEN
...USE VAL1, VAL2, and VAL3 DEPENDING ON LOCATION IN CODE
END DO

What I am curious is, is there a performance penality for going with Derived Data Types compared to static arrays?

Something I like to do is the following:

TYPE BB
 INTEGER::R,C,L,WID,FID                             !(EQUIVALENT TO B(1:5,:))
 DOUBLE PRECISION::Q,QMAX,QOLD                      !(EQUIVALENT TO B(6:8,:))
END TYPE
TYPE(BB),DIMENSION(:),ALLOCATABLE:: B

If I kept everything the same, would there be any performance hit by switching to derived data types?

I even like to go as for as building up a derive data type composed of derived data types such as:

TYPE AA
TYPE(AA),POINTER,DIMENSION(:):: B
TYPE(CC),POINTER,DIMENSION(:):: C
TYPE(DD),POINTER,DIMENSION(:):: D
END TYPE
TYPE(AA),DIMENSION(100):: A
DO I=1,100
N= XXX            !N would be only the size of the part of B from the first example "DOUBLE PRECISION,DIMENSION(7,5000):: B" that was associated with A(I); that is I in A(I) would replace ID1 in the original B.
ALLOCATE(A%B(N))
END DO

I know that there could be a lot more going on speedwise, but it would be interesting to find out general opinions on derived data types in terms of speed of overall code. Speed is a major issue in this code since it is part of a large simulation program and pieces from B and A are pulled to help assemble the system matrices that are numerically solved. I think the hit I would take for using Derived Data Types will be over come by the removal of unecessary looping.

One last question, would it be better to just allocate a max six of B or if the size N for a specific A(I) changes, DEALLOCATE/ALLOCATE B such that it is always the correct size.

Thanks for all your inputs.

Scott

11 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

I agree that derived types could be used to clean this up.  With an array of derived type, as you have written it first, the position for optimization is about the same as your original code (at least if you place the doubles before the integers in hope of better alignment); vectorizability would be improved if the derived type contains arrays, as you seem to be hinting,  I think the usual preference for allocatable over pointer arrays may apply, as you surely can assume at least F95 compilers.  I'm not sure what it will take to get desirable (32-byte or more) alignment in the individual arrays inside a derived type, e.g. do they all need to be sized as multiples of 32 bytes.

It is better to put the DOUBLE PRECISION variables first in a derived data type for optimization?

It would be faster to structure it this way:

TYPE BB
DOUBLE PRECISION::Q,QMAX,QOLD
INTEGER::R,C,L,WID,FID                            
END TYPE
TYPE(BB),DIMENSION(:),ALLOCATABLE:: B

I think your issue is more about use of pointers than of derived types. But leaving that out, you need to keep in mind the classic issues of data locality and stride when accessing your data structures. Try to keep things that are referenced in sequence close together in memory. As ofr order of components, the old recommendation to put the biggest types first still holds. Proper data alignmemt helps a lot. If you have an array, you also want to make sure that the size of each element is an integral multiple of the largest component size (8 for DOUBLE PRECISION, etc.)

Steve

Thanks a ton for those comments, so it would be better to have declarations lumped together

e.g. this would be bad:

      DOUBLE PRECISION::A,B
      INTEGER::C,D
      DOUBLE PRECISION::E,F
                                            !BETTER TO HAVE IT:
      DOUBLE PRECISION::A,B
      DOUBLE PRECISION::E,F
      INTEGER::C,D

I do not understand what your comment about " size of each element is an integral multiple of the largest component size (8 for DOUBLE PRECISION, etc.)"

Do you mean that a double precision allocatable array performs best when its dimensions are multiples of 8? For example

DOUBLE PRECISION,ALLOCATABLE,DIMENSION(:,:,:)::A
ALLOCATE(A(64,16,8))
 

or did you mean something like this:

TYPE BB
DOUBLE PRECISION,DIMENSION(5)::ARRAY
DOUBLE PRECISION::Q,QMAX,QOLD
INTEGER::R,C,L,WID,FID                            
END TYPE
TYPE(BB),DIMENSION(:),ALLOCATABLE:: B
N= A MULTIPLE OF 8 SINCE B%ARRAY IS DOUBLE PRECISION
ALLOCATE(B(N))

Thanks for all the interesting info.

What I meant was that if you have an array of derived type, and you have DOUBLE PRECISION components, you want the total size of the type to be a multiple of 8.  By default, the compiler will add padding so you get this, but it's not a bad idea to keep it in mind when you design the code. I wasn't referring to the number of elements in the array.

Steve

Thank so much, that is really interesting about the Derived Data Types.

Does the same hold true for the Program and Subroutines?

For example:

PROGRAM SLOW
INTEGER:: I,J
DOUBLE PRECISION::A,B
TYPE SIMPLE
  INTEGER:: I,J
  DOUBLE PRECISION::A,B
END TYPE

Would be better off:

PROGRAM FAST
DOUBLE PRECISION::A,B       !DECLARED VARIABLE SWITCH
INTEGER:: I,J
TYPE SIMPLE
  DOUBLE PRECISION::A,B      !TYPE VARIABLE SWITCH
  INTEGER:: I,J
END TYPE

Right now a legacy code that I am improving/updating has all its declarations in the following order:

PROGRAM LEGACY
MOUDULE One_Section_Of_Code
!SCALARS
INTEGER,POINTER
DOUBLE PRECISION,POINTER
REAL,POINTER
INTEGER,POINTER
! 1D Arrays
CHARACTER(LEN=16),POINTER,DIMENSION(:)
INTEGER,POINTER,DIMENSION(:)
DOUBLE PRECISION,DIMENSION(:)
LOGICAL,POINTER,DIMENSION(:)
CHARACTER(LEN=20),POINTER,DIMENSION(:)
!2D arrays
INTEGER,POINTER,DIMENSION(:,:)
DOUBLE PRECISION,POINTER,DIMENSION(:,:)
!3D Arrays
DOUBLE PRECISION,POINTER,DIMENSION(:,:,:)
!DERIVED DATA TYPES
TYPE A1
...
END TYPE
END MODULE
END PROGRAM

Unfortuntely the way the code is structured, I have to use pointer arrays rather than allocatable. What it also likes to do is have all the scalar declarations be pointers so that they remain unallocated if a certain piece is unused (The code has about 30 pieces,but during a simulationg it may only use a fraction of that).

I could come up with a scheme that uses the TARGET,ALLOCATABLE flag for the previous declarations if that might produce an improvement over a POINTER attribute.

Thanks for all your help,

Scott

Variable declarations don't matter - the compiler will properly align them.

Steve

This is changing the topic, but would changing the pointer variables to target,allocatable improve speed at all? 

The code makes heavy use of pointers that holds all the main information in a large data type of pointers, then for each model subsection points local variables to the global data type ( Say TYPE(GLOBAL),DIMENSION(10), each dim contains a set of variables used by all the subroutines of program ). This is done so that the main code can use variables of the same name, but have them point to different values for repeated calls of subroutines.

ALLOCATABLE gives the compiler more information than POINTER - in particular it knows that arrays are contguous. But if you're using pointers to point to these arrays, you'll lose that benefit. I suggest you look elsewhere for speed improvements. Running the program with Intel VTune Amplifier XE would help you locate the hotspots where you can focus your efforts.

Steve

Thanks for all your help with this.

Faça login para deixar um comentário.