Hello,

in order to parallelize completely a F90 application, I need to compute 2D FFT with 1D transforms (they will be distributed among OpenMP threads). These are real to complex transformations and it doesn't seem to work (values are wrong). First, I wrote a sample program before implementing the all thing in the real application. To write it, I looked in the MKL's documentation, in particular, the C-23 example (reference manual, p 3268, version -29 08/2008).

The computation in the first dimension works well, the problem appears when the transformation takes place in the second dimension : so values are wrong, and I do not understand why. I take trivial values for the entry signal (U). Here is the small example :

program dft1D

use mkl_dfti

implicit none

integer, parameter :: nx = 8, ny = 16

real(8), dimension(0:nx-1,0:ny-1) :: U

real(8), dimension(0:nx-1,0:ny-1) :: V

complex(8), dimension(0:nx-1,0:ny/2) :: F

!

real(8) :: pi, usy, xk, errg

integer :: i, k

!

TYPE(DFTI_DESCRIPTOR), POINTER :: desc

INTEGER :: precision

INTEGER :: domain, status

INTEGER :: dim, length

INTEGER, DIMENSION(2) :: input_strides, output_strides

!

pi = acos (-1.0_8)

usy = 1.0_8 / real(ny,8)

DO k = 0, ny-1

xk = 2.0_8 * pi * real(k,8) * usy

U(0,k) = 1.0_8

U(1,k) = cos (xk)

U(2,k) = 2.0_8

U(3,k) = sin (xk)

U(4,k) = 3.0_8

U(5,k) = cos (xk+xk)

U(6,k) = 4.0_8

U(7,k) = sin (xk+xk)

DO i = 0, 7

V(i,k) = 0.0_8

END DO

END DO

!

DO k = 0, ny/2

do i = 0, nx-1

F(i,k) = CMPLX (0.0_8, 0.0_8, 8)

end do

end do

!

precision = DFTI_DOUBLE

domain = DFTI_REAL

dim = 1

length = ny

status = DftiCreateDescriptor (desc, precision, domain, dim, length)

Status = DftiSetValue (desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE)

Status = DftiSetValue (desc, DFTI_FORWARD_SCALE, usy)

Status = DftiSetValue (desc, DFTI_BACKWARD_SCALE, 1.0_8)

Status = DftiSetValue (desc, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_REAL)

input_strides = (/0, nx/)

Status = DftiSetValue (desc, DFTI_INPUT_STRIDES, input_strides)

output_strides = (/0, nx/)

Status = DftiSetValue (desc, DFTI_OUTPUT_STRIDES, output_strides)

status = DftiCommitDescriptor (desc)

!

DO i = 0, nx-1

status = dfti_compute_forward_dz (desc, U(i,0), F(i,0) )

end do

DO i = 0, nx-1

status = dfti_compute_backward_zd (desc, F(i,0), V(i,0) )

END DO

!

DO i = 0, nx - 1

errg = 0.0_8

DO k = 0,ny - 1

errg = max(errg, abs( U(i,k) - V(i,k) ) )

END DO

write (6,'(A,I3,A,E11.4)') 'errg (', i, ') = ', errg

write (6,*)

END DO

status = DftiFreeDescriptor (desc)

stop

end program dft1D

I compile the code with :

ifort -check all -warn all -O0 -g -traceback /opt/intel/Compiler/11.1/064/mkl/include/mkl_dfti.f90 small.f90 -o a.out -L /opt/intel/Compiler/11.1/064/mkl/lib/em64t/ -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm

The output is :

errg ( 0) = 0.0000E+00

errg ( 1) = 0.1707E+01

errg ( 2) = 0.0000E+00

errg ( 3) = 0.1848E+01

errg ( 4) = 0.0000E+00

errg ( 5) = 0.2220E-15

errg ( 6) = 0.0000E+00

errg ( 7) = 0.2220E-15

Thank you for your comments.

Regards,