Hi there,I need to take an advantage of calculation of 2D FFT on the grid, say, 512x512 using the cluster.Algorithm consists of 3 steps:1. 1D FFT's on the raws of the matrix2. Matrix transpose3. Repetition of the step 1.One-processor tests show that the Step 2 (matrix transpose) is the most time consuming part of the algorithm. So, while parallelization of Step 1 is pretty straightforward on the cluster, the matrix transpose on the cluster becomes a bottleneck.There is a group of people from the East Coast who worked on the problem and presumably achieved linear growth of FFT performance on the cluster up to 16 processors, which is quite impressive.The question is, if Intel cluster library can handle the problem of parallelization of 2D FFT on the cluster?Andrei
For more complete information about compiler optimizations, see our Optimization Notice.