Performace of distributed-memory FFTs

Performace of distributed-memory FFTs

Hi,

Are there any published benchmarks for the distributed memory versions of the MKL FFT functions? I am trying to run an in-place FFT2 on a moderately large problem, but my machine (with 16 MPI processes) essentially grids to a halt; the MPI version of FFTW2 seems to work fine on this problem.

The example I was attempting to run is attached; the machine on which I was trying to run this is a 32-processor SGI Altix with 64 GB of RAM with the Message Passing Toolkit from SGI. Thanks in advance.

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.