Hi,

I am using Ifort to perform scientific calculations.

According to Valgrind, the most cpu costly subroutine is the derivative routine, which perform a simple 2D convolution :

do j=1,Nx2

do i=1,Nx1

DF(i,j) = A(i,j,1) * F(i,j) + A(i,j,2) * F(i-1,j) + A(i,j,3) * F(i+1,j)+ A(i,j,4) * F(i,j-1)+ A(i,j,5) * F(i,j+1)

end do

end do

I am trying to use BLAS library to accelerate this calculation, but I failed to find the appropriate way.

I think the best way would be to unrolle the loop, and then use BLAS 2 Vector/Matrix calculations, but I am not sure about that.

Does someone as an idea on how to optimize this calculation ?