We measured time that takes to perform ippsConvolve_32f on i5-4402E processor and have seen that ippsConvolve_32f takes about 5 times longer when we using avx2 compared to avx.
We tried to use ippsConv_32f instead of ippsConvolve_32f and get the same results. We tried possible convolution algorithms (ippAlgAuto, ippAlgDirect and ippAlgFFT) and have seen that using ippAlgAuto and using ippAlgDirect gives the same result (using avx and using avx2).
When we try to use ippAlgFFT in avx we get little performance decrease and in avx2 we get performance increase compared to ippAlgAuto in avx2 but still take more time then avx ippAlgAuto.
The times we get in microSec: AVX AVX2
ippAlgAuto, ippAlgDirect: 4 27
ippAlgFFT 5 5
So it's seems to be bug in ippsConvolve_32f for ippAlgFFT when using avx2.
avx2 should be more faster then avx for each algorithm but we see that for ippAlgFFT there is no improvement and for ippAlgDirect the performance is critically decreased.
We are using static linkage (#include <ipp_h9.h> for avx2 and <ipp_g9.h> for avx before #include <ipp.h>).