I used Intel MKL VSL functions to do image convolution. We found normally the VSL_CONV_MODE_FFT mode runs faster for large image and large kernel combinations. However, I tried an 8k by 8k 4byte float image and a 5 by 5 float kernel with VslsConv2d in VSL_CONV_MODE_FFT mode. The process took almost all my computer remaining memory about 3G and a few minutes to finish. If we try a 7k by 7k float image and a 5 by 5 kernel. It only takes a few seconds to finish. Is it a bug in MKL? Thanks.