Our program makes signal processing and we use IPP to make signal processing more efficient.
We are working now with i7-620UE. We want to increase the performance and tested our application on new generation processors.
We try 3th gen processor (i7-3517UE with avx) and we try 4th gen processor (i5-4402E with avx2).
We configured all processors to work on the same frequency (1.6Ghz -1.7Ghz). So we expected to see improve on performance mainly because new instruction sets (avx and avx2).
We downloaded the latest IPP 8.0 version and we are using static linkage (#include <ipp_h9.h> for avx2 and <ipp_g9.h> for avx before #include <ipp.h>).
We have seen 30% improve of performance when tested on 3th -gen (compared to i7-620UE). So we expected to see about 30% improve of performance on 4th gen. processor compared to 3th gen. processor. We have seen that the improvement only about 8%.
We tried to run application on the same gen. 4 processor in two modes: using avx and using avx2. We have seen that using avx2 give us only 8% of performance improvements.
Vector sizes in our application are in the order of several hundred elements per operation.
Does it make sense that improvement would be such a low for avx2 compared to avx?
How can we measure the performance of IPP? In older version of IPP there was perfsys tools. In IPP 8.0 version I did not found such tools for measuring IPP performance.