Calling local sizes of 3D MPI FFT plans

Calling local sizes of 3D MPI FFT plans

Hello All,

I am moving a legacy code from Linux to Windows that uses FFTW 2.1.5 and so I have created and successfully linked to MKL's FFTW wrappers.  My question however is about some of the wrappers functionality with respect to a 3 dimensional FFT, specifically the wrapper function fftwnd_mpi_local_sizes().  Show below is the original FFTW output and the MKL wrapper output.

fftwnd_mpi_local_sizes(fftwnd_mpi_plan p,
int *local_nx -> int *CDFT_LOCAL_NX,
int *local_x_start -> int *CDFT_LOCAL_X_START,
int *local_ny_after_transpose -> int *CDFT_LOCAL_OUT_NX,
int *local_y_start_after_transpose -> int *CDFT_LOCAL_OUT_X_START
int *total_local_size -> int *CDFT_LOCAL_SIZE)

Local_ny_after_transpose and local_y_start_after_transpose are not being set to the information that is expected in the original FFTW implementation. Our layout and data allocation for the mpi processes heavily rely on the original output.  After looking over the MKL documentation it appears that this is all MKL's FFT can give, unfortunately the Y values are critical.

An example of the problem is if I have a 36 by 16 by 14 X,Y,Z transform over 2 processors, FFTW output is expected to be processor_1(plan,18,0,8,0,4032) processor_2(plan,18,18,8,8,4032) but MKL will output processor_1(plan,18,0,18,0,4032) processor_2(plan,18,18,18,18,4032). This example may be predictable but the sizes of X,Y,Z are arbitrary and so is the number of processors so it no longer becomes very predictable.  Are there any solutions to this problem?

-Thank you all,

7 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Hi,

Attached is the wrapper files that fixed this problem. We will also include it in the future release.

Thanks,
Chao

Fichiers joints: 

Fichier attachéTaille
Télécharger fix4wrapper.tar.bz22.06 Ko

Thank you for your help, however the output still seems to be improper. I deleted the old wrapper library to make sure I was not linking to it and recreated it using your fix but it still did not work. To further test I decided to write my own test using just the mkl functions without wrappers, here is the relevant parts of the code:

LENGTHS(1) = 25
LENGTHS(2) = 15
LENGTHS(3) = 5
PRINT*,"LENGTHS", LENGTHS
STATUS = DftiCreateDescriptorDM(FFT_COMM,DESC,DFTI_DOUBLE, DFTI_COMPLEX,3,LENGTHS)
STATUS = DftiCommitDescriptorDM(DESC)

!***RETRIEVE VALUES***
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NX)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NX_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X_OUT)
!PRINT RETRIEVED VALUES
PRINT*, "DFTI values after transpose:START_X,NX,START_X_OUT,NX_OUT,SIZE"
PRINT *,START_X,NX,START_X_OUT,NX_OUT,SIZE

!***NOW MAKE TRANSPOSED AND REPEAT***
STATUS=DftiSetValueDM(DESC,DFTI_TRANSPOSE,DFTI_ALLOW)
STATUS=DftiCommitDescriptorDM(DESC)
!RETRIEVE VALUES
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NX)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NX_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X_OUT)
!PRINT RETRIEVED VALUES
PRINT*, "DFTI values:START_X,NX,START_X_OUT,NX_OUT,SIZE"
PRINT *,START_X,NX,START_X_OUT,NX_OUT,SIZE

The output I get is

DFTI values:START_X,NX,START_X_OUT,NX_OUT,SIZE
1 5 1 5 1875
DFTI values after transpose:START_X,NX,START_X_OUT,NX_OUT,SIZE
1 5 1 5 1875

As you can see from the output the transpose is not being applied, curiously when the transform is computed I do get transformed data (tested on a square matrix). Is this a bug or am I doing something wrong? I am using mkl 11.0.1.119, ifort.exe 13.0.1.119 Build 20121008, and icl.exe 13.0.1.119 Build 20121008 if that helps.

-Thanks

Hello,

Thanks for your report. We could verify this is bug for the function, and we plan to fix it in the future releas.

Thanks,
Chao

Okay, thank you for confirming my suspicion. I look forward to the next release.

One last comment related to this problem. When I do an FFT set up as such:

call fftwnd_f77_create_plan(fft_fwdplan, 3, fft_size, FFTW_FORWARD, FFTW_ESTIMATE+FFTW_IN_PLACE)

call fftwnd_f77_mpi(frw_plan, n_fields, mat(:, k), work, 1, FFTW_TRANSPOSED_ORDER)

The output found in mat(:,k) is jumbled in a weird way. I think this has to do with whether or not MKL is outputting the data in a row major(c++) or column major way(FORTRAN).

I've attached example files where the "correct way" for a FORTRAN programs output by FFTW is in FFTW_output.txt and the MKL output is in MKL_output.txt for an fft where fft_size = (18,14,12).

The jumble seems to be FFTW( i ) = MKL( nx*z + x + y*nx*nz ) where x=0:nx-1, y=0:ny-1, z=0:nz-1, i=0:nx*ny*nz-1. Please correct the output mix up for FORTRAN programs as you address the issue with local_size. This way my program will not have a time delay as it "un-jumbles" the data

Thank you,
Gabe

The files

Fichiers joints: 

Fichier attachéTaille
Télécharger fftw-output.txt162.42 Ko
Télécharger mkl-output.txt115.85 Ko

Connectez-vous pour laisser un commentaire.