Loop vectorization / Complicated array access

Loop vectorization / Complicated array access

Portrait de NsK

Hi,
I have some issues with loop vectorization:

module global
    implicit none
   
    type type_A
        real(kind=4), allocatable, dimension(:) :: val
    end type type_A
 
end module global
program test
    use global
    implicit none
    type(type_A), target, allocatable, dimension(:) :: A
    type(type_A), target, allocatable               :: AA
    real(kind=4), pointer, dimension(:) :: ptr
    integer :: i
   
    !---
    allocate(AA)
    allocate(AA%val(10000))
    AA%val = 1.0
    
    ptr => AA%val
  
    do i = 1, 100
        !
        ptr(i) = exp(-ptr(i) + 1.0)
        !
    end do
    !---
    
    !---
    allocate(A(1))
    allocate(A(1)%val(10000))
    A(1)%val = 1.0
    
    ptr => A(1)%val
   
    do i = 1, 100
        !
        ptr(i) = exp(-ptr(i) + 1.0)
        !
    end do
    !---    
    
    write(*,*) ptr(500)
end program test

Compiled with Qvec-report3 it produces:

1>main.f90(20): (col. 5) remark: loop was not vectorized: unsupported loop structure.
1>main.f90(24): (col. 5) remark: LOOP WAS VECTORIZED.
1>main.f90(32): (col. 5) remark: loop was not vectorized: unsupported loop structure.
1>main.f90(34): (col. 5) remark: loop was not vectorized: unsupported loop structure.
1>main.f90(38): (col. 5) remark: loop was not vectorized: existence of vector dependence.
1>main.f90(40): (col. 9) remark: vector dependence: assumed FLOW dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 18) remark: vector dependence: assumed ANTI dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 9) remark: vector dependence: assumed FLOW dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 18) remark: vector dependence: assumed ANTI dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 18) remark: vector dependence: assumed ANTI dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 9) remark: vector dependence: assumed FLOW dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 18) remark: vector dependence: assumed ANTI dependence between (unknown) line 40 and (unknown) line 40.
1>main.f90(40): (col. 9) remark: vector dependence: assumed FLOW dependence between (unknown) line 40 and (unknown) line 40.

The only way to get the second loop vectorized seems to add the !dir$ ivdep directive before.
From an old post of Steve(Mon, 02/06/2006 - 18:33):

Quote:

Steve Lionel (Intel) wrote:
[...]
The compiler does not try to vectorize loops where the array access is complicated.[...]
It is a fact that arrays that are components of derived types, especially in conjuction with pointer or allocatable, complicate life for the compiler and as such some optimization opportunities may be missed.
[...]

My understanding is that my issue is related to the complicated array access. Is there a way to make it clear for the compiler without using the vectorization directive on each loop of the code?
Cheers,

Nick

Nick
6 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de Steve Lionel (Intel)
Best Reply

The compiler has changed a lot since 2006, and processors have changed to include new instructions that can help with vectorization. For example, I tried your code with the 14.0 compiler and got this:

C:\Projects\U480019.f90(20): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(24): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(34): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(38): (col. 5) remark: LOOP WAS VECTORIZED
C:\Projects\U480019.f90(32): (col. 5) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

Looks pretty good to me.

Steve
Portrait de NsK

Indeed, I failed to realize that that many versions have been released since XE 2011 (12.1.3526.2010) (February 2012).
Bad luck me strikes again.

Nick

Nick
Portrait de John Campbell

A lot has changed in Fortran !!
I don't understand the need for such a complex data structure. Either of the 4 effective loop structures in the following code vectorise, without resorting to the more complex data structures of the original post. I realy don't know what can be achieved by your coding approach.
My suggestion is KISS ... keep it simple..

module global
    implicit none
    real(kind=4), allocatable, dimension(:) :: A_val, AA_val
 end module global
 program test
    use global
    implicit none
    integer :: i
 !---
    allocate (AA_val(10000))
    AA_val = 0.5
    AA_val(1:100) = exp(-AA_val(1:100) + 1.0)
    write(*,*) AA_val(100), AA_val(500)
 !---
   allocate (A_val(10000))
    A_val = 1.0
    do i = 1, 100
       A_val(i) = exp(-A_val(i) + 1.0)
    end do
 !---
    write(*,*) A_val(500) 
end program test

Portrait de NsK

Well,
Indeed I failed to realize that that many versions of the compiler had been released since my XE 2011 12.1.3526.2010 (February 2012), but this has nothing to do with the changes in Fortran.
The more complex data structure of the original post is there especially to simplify the coding, the pointer approach making the derived data type (and the number of objects, a runtime parameter) transparent to the developers and the algorithm.
Unfortunately, all codes are not equal in front of the prerequisites.

Nick
Portrait de jimdempseyatthecove

John,

NsK produced a small sample code that exhibited his issue. In his case he had an array of arrays. This type of structure can be used for sparse arrays among other things. Use of pointer can somtime cause optimization issues due to the possibility of alias and stride. If NsK's compiler is new enough to have ASSOCIATE, he might try that instead of pointer.

Jim Dempsey

www.quickthreadprogramming.com

Connectez-vous pour laisser un commentaire.