vectorization bug on 11.0.074 (Linux-intel64)
I believe I found a bug in the Fortran compiler 11.0 (Linux). Our lab is a licensed user of the product and I opened an issue on the Intel Premier Account Interface before, but received no immediate help. I have example code for the problem, which is given at the very end of this message. The original bug essentially came down to the following: If trigonometric functions are called inside a loop with complex data structures (vectorizer complains "subscript too complex"), they appear to break the dependency recognition, force vectorization and hence create an incorrect result: (Inacceptable) fixes include: Compiler flags: -O0 -O1 -no-vec Code changes: - Explicitly introduce dependence in the loop through write-statements - Simplify data structures - Use other intrinsic functions (like sqrt) instead of trigonometric ones (by now i have a similar bug, however, which does not involve a single intrinsic) Architecture changes: - go to 32-bit
For reference, example code is given below. When compiled with -O3 as the only option it yields a value of zero for the variable "val" which is clearly incorrect. It can be fixed by any (or more) of the fixes listed above. This is a major bug for scientific code such as the code we work on and hence very important to us.
The previous compiler we used (version 10.1) does deal with this correctly.
Please note that since this first bug I encountered another similar issue with complicated data structures and simple loops in which data dependency is not recognized properly. Results are strictly false without any proper warning or documentation even when using -vec-report3.
I wonder whether this has been fixed in 11.0.081 or whether it is at least being worked on? Right now, version 11 is a strict no-go for us, Andreas
---------------------------------------------------------
module one
type t_dummy real(8), ALLOCATABLE:: it(:,:) real(8), ALLOCATABLE:: f(:) end type t_dummy type(t_dummy), ALLOCATABLE:: dummy(:) integer, ALLOCATABLE:: imap(:) integer, ALLOCATABLE:: bmap(:) end module one
program abug
use one
implicit none
integer ndum,da,db,i real(8), ALLOCATABLE:: prns(:) real(8) val
ndum = 10 da = 2 db = 100
allocate(dummy(ndum)) do i=1,ndum allocate(dummy(i)%it(da,db)) end do allocate(bmap(ndum)) allocate(prns(ndum)) call random_number(prns) bmap(:) = int(10.0*prns - 0.5) + 1 deallocate(prns) allocate(imap(db)) allocate(prns(db)) call random_number(prns) imap(:) = int(30.0*prns - 0.5) + 1 deallocate(prns)
call s2(val) write(*,*) val
deallocate(imap) deallocate(bmap) do i=1,ndum deallocate(dummy(i)%it) end do deallocate(dummy)
end
subroutine s1(i,ttc,arg3,val)
use one
implicit none
integer i,j,ttc real(8), INTENT(in):: arg3 real(8), INTENT(inout):: val real(8) rvec(3)
call random_number(rvec)
do j=1,3 dummy(bmap(i))%it(ttc,3*imap(i)+j) = arg3*rvec(j)*sin(rvec(1)) val = val + dummy(bmap(i))%it(ttc,3*imap(i)+j) end do
end
subroutine s2(val)
use one
implicit none
integer i,ttc,j,imol real(8) arg3 real(8), INTENT(inout):: val
ttc = 1 imol = 1 dummy(imol)%it(ttc,:) = 0.0 val = 0.0
do i=1,9 call random_number(arg3) call s1(i,ttc,arg3,val) end do
end
|