I am currently using VSL's convolution functions to do 2D image convolution in C. Typically, I have 3 arrays and about 50 kernels. I implement convolution using the following pseudo code:

For each array from 1 to 3

{

For each kernel from 1 to 50

{

Call vslsConvExecX(...);

}

}

Typically for convolution using FFT, a forward FFT is applied to both the array and the kernel. The array and the kernel are multiplied in frequency space and a backward FFT is applied to the result. In the above example, it would seem like forward FFT is applied 3 times for each kernel.

To eliminate this redundancy, I considered implementing the following pseudo code:

1. Apply forward FFT to all 3 arrays.

2. For each kernel from 1 to 50

{

Apply forward FFT to the kernel

For each array from 1 to 3

{

Multiply the array and kernel in frequency space. Store results in a separate array.

Apply backward FFT to obtain the convolution results.

}

}

Here are my questions:

1. Is my approach a reasonable way to speed up my program?

2. I implemented the new algorithm using the DFTI calls in MKL, but the resultant images appear garbled. I may have set some of the DFTI parameters wrongly. Here is the way I initialized the descriptor:

status = DftiCreateDescriptor(&descriptor, DFTI_SINGLE, DFTI_REAL, 2, length);

status = DftiSetValue(descriptor, DFTI_ORDERING, DFTI_BACKWARD_SCRAMBLED);

status = DftiSetValue(descriptor, DFTI_PACKED_FORMAT, DFTI_PACK_FORMAT);

status = DftiSetValue(descriptor, DFTI_BACKWARD_SCALE, 1.0f / float(width * height));

status = DftiCommitDescriptor(descriptor);

For debugging purposes, I applied forward FFT to an array and applied backward FFT to get the same array back. I repeated this with the kernel and got the same kernel back too. But if I multiple the array and kernel in frequency space, I get a garbled output array after the backward FFT. What am I doing wrong?

Thanks.