JPeg lossless 200% slow between v8 and t7 dll

JPeg lossless 200% slow between v8 and t7 dll

I found the JPEG lossless speed is 200% slow on the same brand DELL machine with different CPU. The test is done by using the IPP sample program jpegview.exe. From the about menu, I can see the dispatch DLL is different between tow machines. One uses the v8 version, another uses t7 version. Both machines CPUs are Xeon CPU, but the CPU are detected as different types.

Here is the test data. The low performance machine actually has more cores than fast one. How could the JPEG performance are so different?

Fast DELL 490 2 CPU 4 cores

Intel Xeon CPU 5140 @ 2.33GHz
EM64T Family 6 Model 15 Stepping 6, GenuineIntel
iPPjv8-6.0.dll
752x753x3 8 bitsJpeg lossless load timing: 18561.96 us

Slow DELL 490 2 CPU 8 cores
Intel Xeon CPU 3.20GHz
EM64T Family 15 Model 6 Stepping 4, GenuineIntel

iPPjt7-6.0.dll
752x753x3 8 bitsJpeg lossless load timing: 48474.34 us

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

...............

This mean we do have highly optimized code for V8 library (tuned by hands) and do not have this for T7 library (meaning compiler optimization only used in T7 library for some functions used in lossless JPEG).

Regards,
Vladimir

Quoting - Vladimir Dudnik (Intel)

This mean we do have highly optimized code for V8 library (tuned by hands) and do not have this for T7 library (meaning compiler optimization only used in T7 library for some functions used in lossless JPEG).

Regards,
Vladimir

Hi Vladimir,

Thanks for the information.
I found another strange thing about the lossless compression. In the fast machine, I can run 4 threads at a time to speed up the compression about 4 times. However in the slow machine, even it has 8 cores, when I use 4 threads the speed is only up 2 times.

Hm, IPP lossless JPEG is not threaded. Have you run several decoders at parallel?

Vladimir

Quoting - Vladimir Dudnik (Intel)
Hm, IPP lossless JPEG is not threaded. Have you run several decoders at parallel?

Vladimir

I cut an image to 4 pieces and run 4 encoders at 4 threads. My program is a medical image server. I need the lossless compression to reduce the network traffic.

From the WIndows performance monitor, I do see four CPU have high usage, but the compress speed does not scale up to 4, only 2 times on t7 lib.

If your system is dual-processor with hyper-threading then you can't expect 4X speedup. Hyper-threading help to utilize internal processor resources better by letting another thread run while the first stalls waiting for example data from memory. But in general, you still have one physical processor.

Vladimir

Quoting - Vladimir Dudnik (Intel)
If your system is dual-processor with hyper-threading then you can't expect 4X speedup. Hyper-threading help to utilize internal processor resources better by letting another thread run while the first stalls waiting for example data from memory. But in general, you still have one physical processor.

Vladimir

The slow machine I talking about is DELL 490 with 8 cores. Even the hyperthread is on, it should at least have 4 real cores.

Correct, then the performance difference might be a matter of fact the difference in u-architecture (i.e. Pentium 4 Xeon vs Core 2 Quad), difference incache size or difference in processor frequency.

Vladimir

I think you should drop using sample code from JpegView and instead move on to the UIC sample.
I have been busy with Jpeg2000 and have learned that JpegView is now legacy (outdated) and I should use UIC.
I'm now busy with Jpeg2000 using UIC sample code, and I must say, it is much better. Bugs are removed and performance is better. I use a custom DLL linked with T libraries for multithreading.

Quoting - Thomas Jensen
I think you should drop using sample code from JpegView and instead move on to the UIC sample.
I have been busy with Jpeg2000 and have learned that JpegView is now legacy (outdated) and I should use UIC.
I'm now busy with Jpeg2000 using UIC sample code, and I must say, it is much better. Bugs are removed and performance is better. I use a custom DLL linked with T libraries for multithreading.

It is true that UIC is in future. However UIC JPEG lossless is not any faster than JPGView nor it can support save JPEG lossy 12 bits.

Hello,

how did you conclude that UIC JPEG does not support lossy 12-bit compression mode?
You can check this by open any 12-bit image in UIC picnic application and than option to save in JPEG Ext mode became available in Save As dialog.

So, to simplify the things we do not provide 16-bit to 12-bit conversion in application. That mean you can savein 12-bit JPEG Extended Baseline Lossy mode only 12-bit images.

Regards,
Vladimir

Quoting - Vladimir Dudnik (Intel)

Hello,

how did you conclude that UIC JPEG does not support lossy 12-bit compression mode?
You can check this by open any 12-bit image in UIC picnic application and than option to save in JPEG Ext mode became available in Save As dialog.

So, to simplify the things we do not provide 16-bit to 12-bit conversion in application. That mean you can savein 12-bit JPEG Extended Baseline Lossy mode only 12-bit images.

Regards,
Vladimir

Hi Vladimr,
You are right. IPP JPEG does work on 12 bits image. My previous test is wrong because the image is 16 bits pgm files.

Quoting - gangli59
Hi Vladimr,
You are right. IPP JPEG does work on 12 bits image. My previous test is wrong because the image is 16 bits pgm files.

I finally got the 12 bit jpeg lossy compression to work. The trick is to set the param.huffman_opt = 1 and use

JPEG_EXTENDED mode. Why do we have set param.huffman_opt = 1? However in baseline mode I have to set param.huffman_opt = 0 to make 8 bits lossy compression work.

Hello,

for JPEG baseline mode you can use either huffman_opt = 0 or huffman_opt =1. This parameter instruct encoder to use 'default' JPEG tables (if set to 0) or generate huffman tables based on entropy staticstics of this particular image (if set to 1) which result in slightly better compression ratio but costs in performance due to additional steps encoder have to do.
Because 'default' huffman tables assume 8-bit input data they can't be used in JPEG extended baseline mode for 12-bit data. Thus, encoder have to always generate huffman tables in this case.

Regards,
Vladimir

Quoting - Vladimir Dudnik (Intel)

Hello,

for JPEG baseline mode you can use either huffman_opt = 0 or huffman_opt =1. This parameter instruct encoder to use 'default' JPEG tables (if set to 0) or generate huffman tables based on entropy staticstics of this particular image (if set to 1) which result in slightly better compression ratio but costs in performance due to additional steps encoder have to do.
Because 'default' huffman tables assume 8-bit input data they can't be used in JPEG extended baseline mode for 12-bit data. Thus, encoder have to always generate huffman tables in this case.

Regards,
Vladimir

Hi Vladimir,

Thanks for the explanation. I will try huffman_opt = 1 for baseline to see how much compression can be improved.

Leave a Comment

Please sign in to add a comment. Not a member? Join today