avx vs avx 2 performance.

avx vs avx 2 performance.

Hello.

Our program makes signal processing and we use IPP to make signal processing more efficient.

We are working now with i7-620UE. We want to increase the performance and tested our application on new generation processors.

We try 3th gen processor (i7-3517UE with avx) and we try 4th gen processor (i5-4402E with avx2).

We configured all processors to work on the same frequency (1.6Ghz -1.7Ghz). So we expected to see improve on performance mainly because new instruction sets (avx and avx2).

We downloaded the latest IPP 8.0 version and we are using static linkage (#include <ipp_h9.h> for avx2 and <ipp_g9.h> for avx before #include <ipp.h>).

We have seen 30% improve of performance when tested on 3th -gen (compared to i7-620UE). So we expected to see about 30% improve of performance on 4th gen. processor compared to 3th gen. processor. We have seen that the improvement only about 8%.

We tried to run application on the same gen. 4 processor in two modes: using avx and using avx2. We have seen that using avx2 give us only 8% of performance improvements.

Vector sizes in our application are in the order of several hundred elements per operation.

Does it make sense that improvement would be such a low for avx2 compared to avx?

How can we measure the performance of IPP? In older version of IPP there was perfsys tools. In IPP 8.0 version I did not found such tools for measuring IPP performance.

Thank you,

Itzhak

8 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

I posted another message here http://software.intel.com/en-us/forums/topic/472531 about ippsConvolve_32f function. This function about 5 times slower in avx2 then avx. It is important to emphasize that all what I wrote above in this topic (about only 8% improvement for avx2 compared to avx) not take in account using of ippsConvolve_32f function.

Hi Itzhak,

Thanks for letting us know. we will take a look at the function  and let you know if any news.

Regarding avx optimized in IPP,  please see  http://software.intel.com/en-us/articles/haswell-support-in-intel-ipp.  the function seems not in the list.

Can I have your used IPP function list and how do you do the test?  a small test code (including linked library command  will be help )

Best Regards,

Ying

 

Ying, Thank you for help.

We are using next IPP functions: ippmTranspose_m_64f, ippsZero_32fc, ippsMulC_32fc, ippsMul_32fc, ippsFFTFwd_CToC_32fc, ippsAdd_32fc, ippsPowerSpectr_32fc, ippsConvolve_32f, ippsDiv_32f, ippsCopy_32f.

From the link you provided I see that most functions that we are using is optimized for avx2. The only functions that I did not found in list of optimized functions is: ippmTranspose_m_64f, ippsZero_32fc, ippsMulC_32fc, ippsConvolve_32f.

We measuring the performance by __rdtsc instruction. We measuring the time stamp before and after section of code that we want to measure. In this way we can know how much time take some section of code.

Now we measuring section of code that include different IPP functions (optimized and not optimized). I will measure how much time take only ippsFFTFwd_CToC_32fc functions that it is optimized for avx2.

I will send you code in private mesage.

Thank you,

Itzhak

ippspeed.c - the source code I run to test FFT

avx_log.txt  - log  when I enable AVX (#define CPU_AVX)

avx2_log.txt - log  when I enable AVX2 (#define CPU_AVX)

you can see improve only of 8% to CPU

The OS that we run is Tenasys INtime 5.1.13140.1.

The IPP version is 8.0.0.083.

The compiler version is 32-bit C/C++ optimizing compiler version 15.00.30729.01 for 80x86.

Regards,

Itzhak

The files here.

The log file I did not succeed to upload so I print it here:

for AVX enabled:

ippSP AVX (g9) 8.0.0 (r40040)
SSE    :Y
SSE2   :Y
SSE3   :Y
SSSE3  :Y
SSE41  :Y
SSE42  :Y
AVX    :Y
AVX2   :Y
----------
OS Enabled AVX :Y
AES            :Y
CLMUL          :Y
RDRAND         :Y
F16C           :Y
ippsMalloc_8u failed for init (size 0), order 10
fft taked 14828 time

for AVX2 enabled

ippSP AVX2 (h9) 8.0.0 (r40040)
SSE    :Y
SSE2   :Y
SSE3   :Y
SSSE3  :Y
SSE41  :Y
SSE42  :Y
AVX    :Y
AVX2   :Y
----------
OS Enabled AVX :Y
AES            :Y
CLMUL          :Y
RDRAND         :Y
F16C           :Y
ippsMalloc_8u failed for init (size 0), order 10
fft taked 13636 time

附件: 

附件尺寸
下载 ippspeed.c4.47 KB

Hi Itzhak,

Why do you think that 8% is not good enough? The only difference between AVX and AVX2 for floating point code is availability of new FMA instruction – both AVX and AVX2 have 256-bit FP registers. The main advantage of new ISA of AVX2 is for integer code/data types – there you can expect up to 2x speedup, but 8% for FP code is good speedup of AVX2 over AVX.

regards, Igor

Hi Igor,

Thank you for you answer. This will help us to decide whether to upgrade from 3 generation processor to 4 generation processor.

Regards,

Itzhak

发表评论

登录添加评论。还不是成员?立即加入