January 24, 2008 12:48 AM PST
ijl15 vs ijl20 vs IPP jpeg decode performance
Hi,
We have been using ijl15 for decoding jpeg images for quite a while. We have now upgraded to IPP 5.3 and ijl20 and we are noticing a performance slowdown in decoding jpeg images.
The versions of the ijl are: ijl15 - 1.5.4.36 ijl20 - 2.0.18.50
And this is the performance I get with exactly the same code for ijl15 and ijl20:
Using ijl15 0.896MP-0.20MB-1152x778.JPG Reading 100 JPEG pics in: 1125 ms Average: 11.250000 ms per pic Average: 1.255222 ms per 100.000 pixels Average: 5.498104 ms per 100.000 bytes
Using ijl20 0.896MP-0.20MB-1152x778.JPG Reading 100 JPEG pics in: 1219 ms Average: 12.190000 ms per pic Average: 1.360102 ms per 100.000 pixels Average: 5.957501 ms per 100.000 bytes
Is this a known issue? Can I do something to get better performance with ijl20?
Could you please tell us what is your operating system and hardware platform you are working on?
By the way, IJL is depricated library. We now propose to move your code to IPP JPEG codec, which is part of JPEGView sample (in IPP image-codecs sample package). This IPP JPEG codec is the fastest JPEG codec, to achive its excellent performance it utilizes additional processor cores if available in system, it also provide support for lossless mode operations (both encoding and decoding) and support 16-bit per color channel images.
For ReadImageFromJPEG() you need to initialize m_param_jpeg struct, there are some parameters which control decoder behaviour.
If you link with IPP static libraries, please make sure you call ippStaticInit function at the beginning of your application (it is not necessary in case of DLLs).
could you please attach here problem image files which demonstrate that issue? Our testing show that IPP JPEG codec outperform old IJL library on Core2 system with different factors, depending on image compression mode.
The project "IPP-JPEGdecode" uses IPP 5.3 JPEG codec. I have also provided the executable: IPP-JPEGdecode.exe.
The project "IPP-IJL15-20-compare" uses IJL15 or IJL20. I have also provided the executables: IPP-IJL15.exe and IPP-IJL20.exe
I have also provided the test image. I have not provided any Intel files. If you need any of them just let me know.
I have tested on three different computers with about the same setup: OS: MS Win XP Pro V.2002, SP 2 (5.1 build 2600) Platform: Intel Pentium D CPU 3.40 GHz (2 CPUs), 2 GB RAM
The result is the same on all machines. IJL15 is fastest and JPEG codec is slowest: IJL15: 13.5 ms per picture (50% CPU usage, 100% of one core) IJL20: 14.5 ms per picture (50% CPU usage, 100% of one core) JPEG codec: 20 ms per picture (100% CPU usage, 100% of each core)
So, in my test "JPEG codec" is actually much slower since it uses 100% of two cores, while IJL only uses 100% of one core...!
Also, the colors is inverted or something in the images produced by the IPP JPEG Codec. Maybe this is a hint to what's wrong!?
Please let me know if you need anything else from me!
please note that IPP JPEG codec (part of JPEGView sample) use OpenMP threading. To enable that you need to compile it with Intel C/C++ compiler with sample's build script or you need to specify /Qopenmp option for ICL or /openmp option for VC2005 compilers if you use VC studio project.
The second point, it is better to link codec with IPP static libraries.
With all these conditions our test show that IPP JPEG codec is the fastest codec between (IJL 1.5, IJL 2.0 and IJG JPEG codecs).
Thanks for your support and patience in this matter Vladimir!
I'm using VC2005, and I have set the Language option OpenMP Support to Yes. I have also tried with static library linking. But I don't get any better performance than previously stated.
So, I also tried with the JPEGView sample. I made this modification in JPGViewDoc.cpp:
To get the time it takes to decode 100 pictures. I built the application with the original supplied build32.bat and Makefile. Still, the performance is worse compared to IJL!
This is the results, in milliseconds it takes to decode one picture - Ijl15 fastest and JpegView slowest:
Picture
Ijl15
Ijl20
IPP 5.3
JpegView
640x480
4,3
5
6,25
6,4
1152x778
12,9
13,6
18,5
19,2
Isn't the sample application optimized? What can be wrong? How do we solve this in the easiest way?
I am working on an application which compresses the Video memory buffer using JPEG's ijl20 library. It works perfectly fine with the 24-bit buffer generated by my Display driver. To meet a new requirement, I am creating a 16-bit display driver. But, JPEG compression fails during ijlWrite throwing error either of type IJL_UNSUPPORTED_SUBSAMPLING or IJL_INVALID_JPEG_PROPERTIES. I happened to see at various places on the net that old JPEG doesn't support 16-bit channel whereas the new IPP JPEG 2000 codec does support it. I would be really grateful if u clarify this doubt of mine that whether ijl20 has support for 16 bit channel or not.
Talking about 16-bit images people can mean different things. Do you mean YUY2 format or do you mean RGB565 format? Or you mean 16-bit per color channel format (some medical images use that)?
I would recommedn you to migrate from IJL to IPP JPEG codec found in JPEGView sample. This code supports 16-bit per color channel lossless compression and also support YUY2 format (specified as JC_YCBCR and JS_422)
Vladimir, what is the reason for not including that JPEG code in the IPP itself if it is so great and fast as you say?
Another point to consider is that some people still have requirements for some projects to use only plain C code which is not possible if you implement the JPEG codec as C++ class.
Finally, I have noticed that IPP JPEG decompression speed is poor for some particular images while applications based on IJG code do not exhibit the same slowdown.
I really think you have to perform more testing with images created by different JPEG compressors and optimizers and that you should also evaluate performance on older CPUs (at least Pentium D).
-------- If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.
What do you mean under "including that JPEG code in the IPP itself'? There are two different things. The first one is binaries of optimized low-level libraries which provide C interface. It is IPP libraries. And these IPP libraries itself does not contain any high level components like codecs, file readers, renders and so on. And the second thing is set of IPP samples, which are available in source code and demostrate how you can implement some high-level components, like codecs and how you can build application which combines all this stuff to get kind of fiinal solution, for exampe, image viewer application.
So, all JPEG codecs, available within IPP samples (IJG, IJL and IPP JPEG codec) are actually build on top of the same IPP libraries. Yes, the different codecs may use the different set of IPP low-level functions, they may have different architecture and set of features.
That's correct, additional wrapper is required for C++ based codec for those who need pure C interface. But please note, that the industry trend is to move on C++ from C and personally I think it will continue in the future.
Thanks for reporting on your findings, could you please attach sample image which cause performance issue you mention? That would help us to reproduce and investigate the issue.
Please be noted, that color convertion functions used in the original IJL library are not so precise as the counterparts we developed in IPP. The IJL functions use 8-bit fixed point precision for YCbCr to RGB convertion whereas the IPP JPEG color convertion functions utilize at least 14-bit. Other source of accuracy lost is IDCT operation. In IPP we have quite high precision IDCT function. You may use simple test to see the difference in accuracy of JPEG decoding between the original IJL and IPP JPEG codecs: 1. choose some reference not compressed image, say test.bmp 2. compress it with reference JPEG encoder (you may use the original IJG cjpeg utility) 3. decompress it to BMP with original IJL codec 4. decompress it to BMP with any IPP codec 5. calculate the absolute difference with formulae like diff_ijl.bmp[i,j] = abs(test.bmp[i,j] - ijl.bmp[i,j]) and diff_ipp.bmp[i,j] = abs(test.bmp[i,j] - ipp.bmp[i,j]).
Then you can see in which case the absolute difference is higher.
Vladimir, I will see if I can find that particular image for you and attach it.
As for industry trend, I agree that we are moving towards C++ but C still has its place in embedded systems and in systems which have to interface with other code wrtten in assembler or some other language.
As for the IPP, I find the lack of complete JPEG codec implementation a bit disturbing. As you say, there are several ways to skin the cat (several samples) but neither one is a complete solution. It is ok to offer low-level interface for those who need specialized features but implementing high-level interface in the library itself is equally important in my opinon.
For example, most applications need to decode whole image at once from the source buffer into the destination buffer to be able to display it. Why not also provide high-level API which does just that so that they don't have to keep reimplementing it?
When we are at it, does IPP support decoding embedded color profiles and applying those on the decoded image?
-------- If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.
As I mentioned in a previous reply (02-27-2008, 2:25 AM), I have tested the JPEG decode performance with the JPEGView sample - using the original supplied build32.bat and Makefile. Still is the JPEG codec performing worse than IJL 15 and 20.
How come? Shouldn't the use of the original supplied build32.bat and Makefile ensure that JPEG codec is used in an optimal way and according to you be faster than both IJL 15 and 20?
We still not able to reproduce that with your test application (I do not have Pentium D in hand so was using Core 2 Duo system).
You are correct, original build script should provide the results like we published in IJL-IPP sample's documentation (on similar system of course). Although we did use Intel C/C++ compiler to test for performance.
I got my hands on a Core 2 duo system, and I still get best performance with IJL15 and worst with IPP: Milliseconds to decode picture 1152x778 pixels: IJL15 IJL20 IPP 11.7 12.8 14.5
Even if we disregard from my own testprogram, I get the same result with your sample applications! I have tested this application: Ipp5.3.2ipp-samplesimage-codecsjpeg-ijlinwin32_cl8jpgview.exe Which I suppose uses IJL20. And this application: Ipp5.3.2ipp-samplesimage-codecsjpegviewinwin32_cl8jpgview.exe Which I suppose uses IPP decoder. The test results are as follows on a Core 2 Duo system with Vista Business 6.0 build 6000: IJL20 IPP 12.8-15.8 13.7-31,7 (milliseconds to decode picture 1152x778 pixels - read from 'USEC' in the status bar in the program)
We clearly see that IJL20 is faster with 12.8 ms compared to 13.7 ms for IPP. We also se that IPP has a much bigger span, highest value 31.7 ms compared to 15.8 ms for IJL20. How come? What are your test results if you compare both applications?
If IPP still is the fastest for you, can you please provide me with test applications where IPP is faster than IJL, so I can try this on my Core 2 duo system?
I've attached test program which I use this time (to rebuild it you will need old IJL library, we do not distribute them anymore). Precompiled executable is located in Release folder. If you will specify no parameters then generated image will be used for testing otherwise you need to specify valid name of BMP file (24-bit per pixel)
I have tested it now, and as you say, it shows better performance with IPP than IJL15 on a Core 2 Duo system. However, there are more to consider!
1. You are using IPP 6.0.82.530. We are using IPP 5.3.85.467, which is the latest version released to us. IPP 5.3 and IJL15 has about the same performance on Core 2 Duo, and IJL15 is faster than IPP 5.3 on Pentium D! How come you are comparing with 6.0, when the latest released is 5.3.2? Why haven't this been mentioned?
2. CPU usage. It's a fact that IPP 6.0 is faster than IJL15 and IPP 5.3 on Core 2 duo. But it also doubles the total CPU usage from 50% to 100%. Since IPP 6.0 is twice as fast, it actually isn't faster at all if the CPU usage also is considered!
3. Pentium D. On my pentium D machine I have the following results with a 2880x1944 image: IJL15: 116 ms IPP 5.3: 145 ms IPP 6.0: 140 ms IJL15 is clearly fastest. IPP 5.3 is actualy second best, since it is only slightly slower than IPP 6.0, while only consuming 50% CPU. IPP 6.0 has 100% CPU usage.
Considering these three facts, I really can't see any performance improvement with IPP compared to IJL15, neither IPP 5.3 nor IPP 6.0 and neither on Pentium D nor Core 2 duo systems. What are your comments on this? Is there more to be considered?
So, basically you were able to reproduce the results which I have on my system (IPP JPEG is faster then the old IJL library). That's good.
1. IPP 6.0 beta just was published, you can register and download it from IPP main page. But just in case, I also attached the same pre-built application linked with IPP 5.3. Please try it and let us know what is results on your system. On our side it shows that IPP outperform IJL just like IPP 6.0 beta did in previous application.
2. "...it actually isn't faster at all if the CPU usage also is considered!". Probably there is some disperance in terms. We call something is faster when it can do more for the same amount of time. It says nothing on how calculation intensive it will to make the things faster.
3. Unfortunately, I do not have Pentium D system in hands, so can't test it. By the way, one guess I just get - IJL was compiled with Intel C/C++ compiler, whereas my application attached in the previous post was compiled with VC2005, that might be one of the reasons for worse performance. The second reason is as I already said somewhere in this thread that we increase arithmetic precision in color conversion functions in IPP because of many customers complain on relatevely big rounding errors in IJL. That cost us some performance. You may compare PSNR for IJL and IPP JPEG codec.
Taking all of that into account I see that at least on Core2 system (where I can run this test) IPP do the work for 60 msec (compression) and 57 msec (decompression) while IJL do for 191 msec and 93 msec accordingly. From my perspective, 60 msec to compress 2Kx2K image is faster than 191 msec for the same job. I also expect that to do the work more than twice faster will definetely require more processor resources.
Please find attached precompiled test application built with Intel C/C++ compiler and linked with IPP 5.3.
Yes, it's nice I managed to reproduce the results.
1. Ok, we'll stick to 5.3 untill 6.0 is official. Your 5.3 app gave the same result as the 6.0.
2. Ok, I was a bit unclear. It is faster as you say. However, we often decode several motion jpeg streams at once and whenever we decode more than one stream at once, we will not have the performance improvement since the cpu load doubles. Anyway, we prefer IPP since it fully utilizes the cpu even when we decode only one stream, and also since the color conversion is imroved.
3. The performance on the Pentium D system is the same with the 5.3 application I got from you...
I have one (perhaps last) problem! I haven't been able to get the performance improvement with the 5.3 application I rebuilt from your 6.0 application. I'm not sure why, but one guess is that it's becaus I don't have "libiomp5mt.lib" and had to remove it from Additional dependencies for the linker. Could this be the case? If so, where can I get the library? If not, what could else be the problem? The application runs ok, but it only uses 50% cpu, so the performance is of course half compared to your application. Both on Pentium D and Core 2 duo systems.