I am testng the performance both Intel IPP LZO and LZO(Ver2.0.6). I found that the IPP performance is much lower than LZO2.06 .
My test bed:
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (Sandy Bridge Arch)
24 GB RAM, BIOS Version: 1.2.6
•OS: RH6.0, kernel 2.6.32-71.el6.x86_64
•Intel IPP main package: parallel_studio_xe_2011_sp1_update3_intel64
•LZO version 2.06 •Compile Option: gcc
Test Method :
1.First I can configure the thread number and round number to do the compression. (The ipp internal thread mode is IppLZO1XST, but benchmark program is multithread )
2.Then, the benchmark program reads full file into memory and compress whole in memory.
3.Finally we can get the result about performance and compress ratio.
The main procedure for Intel IPP LZO test program pseudocode:
*The source file to be compressed is 16MB and the compression ratio is 1.5:1
#define BUFSIZE 16*1024*1024 /* 16MB */
void compress_per_thread(const char* pInFileName, int opt_round_num) // this is the thread function
Ipp8u* p_in_buffer = NULL;
Ipp32u srcLen, dstLen, lzoSize;
fd_in = open(pInFileName, O_RDONLY, 0);
ippsEncodeLZOGetSize(IppLZO1XST, BUFSIZE, &lzoSize);
pLZOState = (IppLZOState_8u*)ippsMalloc_8u(lzoSize);
ippsEncodeLZOInit_8u(IppLZO1XST, BUFSIZE, pLZOState);
p_in_buffer = ppsMalloc_8u(BUFSIZE);
p_out_buffer = ppsMalloc_8u(BUFSIZE + BUFSIZE / 10);
src_len = read(fd_in, p_in_buffer, BUFSIZE); // I make sure that the size of src_file is BUFSIZE. So, program read the whole file into memory .
for(i = 0; i < opt_round_num; i++) // Specified the opt_round_num for per thread to tune performance
ippsEncodeLZO_8u(p_in_buffer , src_len , p_out_buffer , (Ipp32u*)&dst_len, pLZOState);
The main procedure for LZO(v.2.0.3) test program is same to IPP LZO, it calls function lzo1x_1_compress to compress.
Performance reached the optimal value when thread nume is 24. But the performance for IPP LZO is 10.3 Gbps and LZO v2.0.6 is 31.18 Gbps.
Why the IPP LZO performance is much slower than LZO v2.0.6 with the 16MB test data?
If I configure the IPP thread mode to IppLZO1XMT(the thread number equals number of processors in the system by default), and my benchmark program
thread number also aquals to number of processors in the system. I think the thread context-switch will degrade performance.