ijl15 vs ijl20 vs IPP jpeg decode performance

ijl15 vs ijl20 vs IPP jpeg decode performance

Hi,

We have been using ijl15 for decoding jpeg images for quite a while.
We have now upgraded to IPP 5.3 and ijl20 and we are noticing a performance slowdown in decoding jpeg images.

The versions of the ijl are:
ijl15 - 1.5.4.36
ijl20 - 2.0.18.50

Whatwe do is basically:

JPEG_CORE_PROPERTIES m_jcp;
BYTE* m_pData;
ijlInit(&m_jcp);
ijlRead(&m_jcp, IJL_JBUFF_READPARAMS);
m_pData = (BYTE*)ippMalloc(dwSize);
m_jcp.JPGBytes = pPicData;
m_jcp.JPGSizeBytes = dwPicDataSize;
m_jcp.DIBBytes = m_pData;
ijlRead(&m_jcp, IJL_JBUFF_READWHOLEIMAGE);

And this is the performance I get with exactly the same code for ijl15 and ijl20:

Using ijl15
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1125 ms
Average: 11.250000 ms per pic
Average: 1.255222 ms per 100.000 pixels
Average: 5.498104 ms per 100.000 bytes

Using ijl20
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1219 ms
Average: 12.190000 ms per pic
Average: 1.360102 ms per 100.000 pixels
Average: 5.957501 ms per 100.000 bytes

Is this a known issue? Can I do something to get better performance with ijl20?

Br,
Robert

31 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
vladimir-dudnik (Intel)'s picture

Hi Robert,

Could you please tell us what is your operating system and hardware platform you are working on?

By the way, IJL is depricated library. We now propose to move your code to IPP JPEG codec, which is part of JPEGView sample (in IPP image-codecs sample package). This IPP JPEG codec is the fastest JPEG codec,to achive itsexcellent performance it utilizes additional processor cores if available in system, it also provide support for lossless mode operations (both encoding and decoding) and support 16-bit per color channel images.

Regards,
Vladimir

Hi Vladimir,

Thanks for your quick answer!

OS:
MS Win XP Pro V.2002, SP 2 (5.1 build 2600)

Platform:
Intel Pentium D CPU 3.40 GHz (2 CPUs),2 GB RAM

Wewill investigate using IPP JPEG codec as well!

Br,
Robert

Hi again,

We have now done some testing with the IPP JPEG codec. However, we have not been able to improve performance. On the contrary it got slightly worse!

Using IPP JPEG codec version 5.3.1.064
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1375 ms
Average: 13.750000 ms per pic

Compare this to the ijl results:

Using ijl15
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1266 ms
Average: 12.660000 ms per pic

Using ijl20
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1344 ms
Average: 13.440000 ms per pic

The code is taken from the sample app JPEGView and looks like follows:

void GetImageFromJPEG()
{
CIppImage m_image;
PARAMS_JPEG m_param_jpeg;
Ipp8u* pJPEG = 0;
int JPEGSize;
JERRCODE jerr;
CMemBuffInput in;
CFile jpeg;

printf("Using IPP JPEG codec version 5.3.1.064
");
printf("0.896MP-0.20MB-1152x778.JPG
");
jpeg.Open("Pics.896MP-0.20MB-1152x778.JPG", CFile::modeRead|CFile::typeBinary);

JPEGSize = (int)jpeg.GetLength();
pJPEG = (Ipp8u*)ippMalloc(JPEGSize);
jpeg.Read(pJPEG,JPEGSize);
jpeg.Close();
jerr = in.Open(pJPEG,JPEGSize);
m_image.Color(JC_UNKNOWN);

int iterations = 100;
DWORD dwStartTick = GetTickCount();
long pos = 0;

for(int i=0; i{
jerr = ReadImageJPEG(&in,&m_param_jpeg,&m_image);
in.TellPos(&pos);
jerr = in.Seek(pos*-1, 1);
}

DWORD dwEndTick = GetTickCount();
printf("Reading %d JPEG pics in: %lu ms
", iterations, dwEndTick - dwStartTick);
printf("Average: %f ms per pic
", (double)(dwEndTick-dwStartTick) / iterations);
}

Any ideas of why we don't get better performance with IPP JPEG codec than with ijl15 and ijl20?
As stated above, we are using IPP version 5.3.1.064

Br,
Robert

vladimir-dudnik (Intel)'s picture

That look strange.

For ReadImageFromJPEG() you need to initialize m_param_jpeg struct, there are some parameters which control decoder behaviour.

If you link with IPP static libraries, please make sure you call ippStaticInit function at the beginning of your application (it is not necessary in case of DLLs).

Vladimir

I have tried with different values for m_param_jpeg.now. However, the only parameter improving the speed is m_param_jpeg.dct_scale.

When I increase this from JD_1_1 it gets faster:
JD_1_1 - 1750 ms
JD_1_2 - 1620 ms
JD_1_4 - 1320 ms
JD_1_8 - 1030 ms

With ijl15 it takes about 12ms to decode one image, but then this is still full size and not scaled down as the case is when JD differs from 1_1.

And I am using DLLs, so this shouldn't be the matter either...

Any other suggestions?

Br,
Robert

vladimir-dudnik (Intel)'s picture

Hi Robert,

could you please attach here problem image files which demonstrate that issue? Our testing show that IPP JPEG codec outperform old IJL library on Core2 system with different factors, depending on image compression mode.

Regards,
Vladimir

Hi Vladimir,

Thanks for looking into this issue. We really appreciate it!

I've attached the picturewith which we've done the testing.

Br,
Robert

Attachments: 

AttachmentSize
Download 0.896MP-0.20MB-1152x778.JPG199.82 KB

Hi again,

Did you have any chance to look into the issue?

Do you need any other information from me?

Br,
Robert

vladimir-dudnik (Intel)'s picture

Hi Robert,

we were not able to reproduce performance issue with your image. Could you please provideyour test project to see what can be the reason?

Regards,
Vladimir

Hi Vladimir,

I attach my test projects for you to look into.

The project "IPP-JPEGdecode" uses IPP 5.3 JPEG codec.
I have also provided the executable: IPP-JPEGdecode.exe.

The project "IPP-IJL15-20-compare" uses IJL15 or IJL20.
I have also provided the executables: IPP-IJL15.exe and IPP-IJL20.exe

I have also provided the test image.
I have not provided any Intel files. If you need any of them just let me know.

I have tested on three different computers with about the same setup:
OS: MS Win XP Pro V.2002, SP 2 (5.1 build 2600)
Platform: Intel Pentium D CPU 3.40 GHz (2 CPUs),2 GB RAM

The result is the same on all machines. IJL15 is fastest and JPEG codec is slowest:
IJL15: 13.5 ms per picture (50% CPU usage, 100% of one core)
IJL20: 14.5 ms per picture (50% CPU usage, 100% of one core)
JPEG codec: 20 ms per picture (100% CPU usage, 100% of each core)

So, in my test "JPEG codec" is actually much slower since it uses 100% of two cores, while IJL only uses 100% of one core...!

Also, the colors is inverted or something in the images produced by the IPP JPEG Codec. Maybe this is a hint to what's wrong!?

Please let me know if you need anything else from me!

Br,
Robert

Attachments: 

AttachmentSize
Download Intel.zip6.8 KB
vladimir-dudnik (Intel)'s picture

Hi Robert,

please note that IPP JPEG codec (part of JPEGView sample) use OpenMP threading. To enable that you need to compile it with Intel C/C++ compiler with sample's build script or you need to specify /Qopenmp option for ICL or /openmp option for VC2005 compilers if you use VC studio project.

The second point, it is better to link codec with IPP static libraries.

With all these conditions our test show that IPP JPEG codec is the fastest codec between (IJL 1.5, IJL 2.0 and IJG JPEG codecs).

Regards,
Vladimir

Thanks for your support and patience in this matter Vladimir!

I'm using VC2005, and I have set the Language option OpenMP Support to Yes.
I have also tried with static library linking. But I don't get any better performance than previously stated.

So, I also tried with the JPEGView sample. I made this modification in JPGViewDoc.cpp:

DWORD dwStartTick = GetTickCount();
long pos = 0;
for(int i=0; i<100; i++)
{
jerr = ReadImageJPEG(&in,&m_param_jpeg,&m_image);
in.TellPos(&pos);
jerr = in.Seek(pos*-1, 1);
}
string.Format("Decoded 100 jpeg pics in: %lu ms", GetTickCount()-dwStartTick);
AfxMessageBox(string,MB_OK);

To get the time it takes to decode 100 pictures. I built the application with the original supplied build32.bat and Makefile. Still, the performance is worse compared to IJL!

This is the results, in milliseconds it takes to decode one picture - Ijl15 fastest and JpegView slowest:

Picture
Ijl15
Ijl20
IPP 5.3
JpegView

640x480
4,3
5
6,25
6,4

1152x778
12,9
13,6
18,5
19,2

Isn't the sample application optimized?
What can be wrong?
How do we solve this in the easiest way?

Br,
Robert

Hi,

I am working on an application which compresses the Video memory buffer using JPEG's ijl20 library. It works perfectly fine with the 24-bit buffer generated by my Display driver.To meet anew requirement,I am creating a 16-bit display driver. But, JPEG compression fails during ijlWrite throwing error either of type IJL_UNSUPPORTED_SUBSAMPLING or IJL_INVALID_JPEG_PROPERTIES. I happened to see at various places on the net that old JPEG doesn't support16-bit channelwhereas the new IPP JPEG 2000 codec does support it. I would be really grateful if u clarify this doubt of mine that whether ijl20 has support for 16 bit channel or not.

Regards,

Bhavani Madiraju

Hi

if i run the same code (intel zip), it is showing

error when decoding jpeg data-buffer too small

what is the reason.

vladimir-dudnik (Intel)'s picture

Talking about 16-bit images people can mean different things. Do you mean YUY2 format or do you mean RGB565 format? Or you mean 16-bit per color channel format (some medical images use that)?

I would recommedn you to migrate from IJL to IPP JPEG codec found in JPEGView sample. This code supports 16-bit per color channel lossless compression and also support YUY2 format (specified as JC_YCBCR and JS_422)

Regards,
Vladimir

Igor Levicki's picture

Vladimir, what is the reason for not including that JPEG code in the IPP itself if it is so great and fast as you say?

Another point to consider is that some people still have requirements for some projects to use only plain C code which is not possible if you implement the JPEG codec as C++ class.

Finally, I have noticed that IPP JPEG decompression speed is poor for some particular images while applications based on IJG code do not exhibit the same slowdown.

I really think you have to perform more testing with images created by different JPEG compressors and optimizers and that you should also evaluate performance on older CPUs (at least Pentium D).

-- Regards, Igor Levicki If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.
vladimir-dudnik (Intel)'s picture

Hi Igor,

What do you mean under "including that JPEG code in the IPP itself'? There are two different things. The first one is binaries of optimized low-level libraries which provide C interface. It is IPP libraries. And these IPP libraries itself does not contain any high level components like codecs, file readers, renders and so on. And the second thing is set of IPP samples, which are available in source code and demostrate how you can implement some high-level components, like codecsand how you can build application which combines all this stuff to get kind of fiinal solution, for exampe, image viewer application.

So, all JPEG codecs, available within IPP samples (IJG, IJL and IPP JPEG codec) are actually build on top of the same IPP libraries. Yes, the different codecsmay use the different set of IPP low-level functions, they may have different architecture and set of features.

That's correct, additional wrapper is required for C++ based codec for those who need pure C interface. But please note, that the industry trend is to move on C++ from C and personally I think it will continue in the future.

Thanks for reporting on your findings, could you please attach sample image which cause performance issue you mention? That would help us to reproduce and investigate the issue.

Please be noted, that color convertion functions used in the original IJL library are not so precise as the counterparts we developed in IPP. The IJL functions use 8-bit fixed point precision for YCbCr to RGB convertion whereas the IPP JPEG color convertion functions utilize at least 14-bit. Other source of accuracy lost is IDCT operation. In IPP we have quite high precision IDCT function. You may usesimple test to see the difference in accuracy of JPEG decoding between the original IJL and IPP JPEG codecs:
1. choose some reference not compressed image, say test.bmp
2. compress it with reference JPEG encoder (you may use the original IJG cjpeg utility)
3. decompress it to BMP with original IJL codec
4. decompress it to BMP with any IPP codec
5. calculate the absolute difference with formulae like diff_ijl.bmp[i,j] = abs(test.bmp[i,j] - ijl.bmp[i,j]) and diff_ipp.bmp[i,j] = abs(test.bmp[i,j] - ipp.bmp[i,j]).

Then you can see in which case the absolute difference is higher.

Regards,
Vladimir

Igor Levicki's picture

Vladimir, I will see if I can find that particular image for you and attach it.

As for industry trend, I agree that we are moving towards C++ but C still has its place in embedded systems and in systems which have to interface with other code wrtten in assembler or some other language.

As for the IPP, I find the lack of complete JPEG codec implementation a bit disturbing. As you say, there are several ways to skin the cat (several samples) but neither one is a complete solution. It is ok to offer low-level interface for those who need specialized features but implementing high-level interface in the library itself is equally important in my opinon.

For example, most applications need to decode whole image at once from the source buffer into the destination buffer to be able to display it. Why not also provide high-level API which does just that so that they don't have to keep reimplementing it?

When we are at it, does IPP support decoding embedded color profiles and applying those on the decoded image?

-- Regards, Igor Levicki If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.

Hi Vladimir,

As I mentioned in a previous reply (02-27-2008, 2:25 AM), I have tested the JPEG decode performance with the JPEGView sample - using the original supplied build32.bat and Makefile. Still is the JPEG codec performing worse than IJL 15 and 20.

How come?
Shouldn't the use of the original supplied build32.bat and Makefile ensure that JPEG codec is used in an optimal way and according to you be faster than both IJL 15 and 20?

vladimir-dudnik (Intel)'s picture

Hi Robert,

We still not able to reproduce that with your test application (I do not have Pentium D in hand so wasusing Core 2 Duo system).

You are correct, original build script should provide the results like we published in IJL-IPP sample's documentation (on similar system of course). Although we did use Intel C/C++ compiler to test for performance.

Regards,
Vladimir

Hi Vladimir,

I got my hands on a Core 2 duo system, and I still get best performance with IJL15 and worst with IPP:
Milliseconds to decode picture 1152x778 pixels:
IJL15 IJL20 IPP
11.7 12.814.5

Even if we disregard from my own testprogram, I get the same result with your sample applications!
I have tested this application: Ipp5.3.2ipp-samplesimage-codecsjpeg-ijlinwin32_cl8jpgview.exe
Which I suppose uses IJL20.
And this application: Ipp5.3.2ipp-samplesimage-codecsjpegviewinwin32_cl8jpgview.exe
Which I suppose uses IPP decoder.
The test results are as follows on a Core 2 Duo system with Vista Business 6.0 build 6000:
IJL20 IPP
12.8-15.8 13.7-31,7 (milliseconds to decode picture 1152x778 pixels - read from 'USEC' in the status bar in the program)

We clearly see that IJL20 is faster with 12.8 ms compared to 13.7 ms for IPP.
We also se that IPP has a much bigger span, highest value 31.7 ms compared to 15.8 ms for IJL20.
How come?
What are your test results if you compare both applications?

If IPP still is the fastest for you, can you please provide me with test applications where IPP is faster than IJL, so I can try this on my Core 2 duo system?

Br,
Robert

vladimir-dudnik (Intel)'s picture

Hi Robert,

I've attached test program which I use this time (to rebuild it you will need old IJL library, we do not distribute them anymore). Precompiled executable is located in Release folder. If you will specify no parameters then generated image will be used for testing otherwise you need to specify valid name of BMP file (24-bit per pixel)

Regards,
Vladimir

Attachments: 

AttachmentSize
Download ijl_vs_ipp.zip452.79 KB

Vladimir, thanks for the test program!

I have tested it now, and as you say, it shows better performance with IPP than IJL15 on a Core 2 Duo system.
However, there are more to consider!

1.
You are using IPP 6.0.82.530.
We are using IPP 5.3.85.467, which is the latest version released to us.
IPP 5.3and IJL15 has about the same performance on Core 2 Duo, and IJL15 is faster than IPP 5.3on Pentium D!
How come you are comparing with 6.0, when the latest released is 5.3.2?
Why haven't this been mentioned?

2.
CPU usage.
It's a fact that IPP 6.0 is faster than IJL15 and IPP 5.3 on Core 2 duo. But it also doubles the total CPU usage from 50% to 100%. Since IPP 6.0 is twice as fast, it actually isn't faster at all if the CPU usage also is considered!

3.
Pentium D.
On my pentium D machine I have the following results with a 2880x1944 image:
IJL15: 116 ms
IPP 5.3: 145 ms
IPP 6.0:140 ms
IJL15 is clearly fastest.
IPP 5.3 is actualy second best, since it is only slightly slower than IPP 6.0, while only consuming 50% CPU. IPP 6.0 has 100% CPU usage.

Considering these three facts, I really can't seeany performance improvement with IPP compared to IJL15, neither IPP 5.3 nor IPP 6.0 and neither on Pentium D nor Core 2 duo systems.
What are your comments on this? Is there more to be considered?

Br,
Robert

vladimir-dudnik (Intel)'s picture

Hi Robert,

So, basically you were able to reproduce the results which Ihave on my system(IPP JPEG is faster then the old IJL library). That's good.

1. IPP 6.0 beta just was published, you can register and download it from IPP main page. But just in case, I also attached the same pre-built application linked with IPP 5.3. Please try it and let us know what is results on your system. On our side it shows that IPP outperform IJL just like IPP 6.0 beta did in previous application.

2. "...it actually isn't faster at all if the CPU usage also is considered!". Probably there is some disperance in terms. We call something is faster when it can do more for the same amount of time. It says nothing on how calculation intensive it will to make the things faster.

3. Unfortunately, I do not have Pentium D system in hands, so can't test it. By the way, one guess I just get - IJL was compiled with Intel C/C++ compiler, whereas my application attached in the previous post was compiled with VC2005, that might be one of the reasons for worse performance. The second reason is as I already said somewhere in this thread that we increase arithmetic precision in color conversion functions in IPP because of many customers complain on relatevely big rounding errors in IJL. That cost us some performance. You may compare PSNR for IJL and IPP JPEG codec.

Taking all of that into account Isee that at least on Core2 system (where I can run this test) IPP do the work for 60 msec (compression) and 57 msec (decompression) while IJL do for 191 msec and 93 msec accordingly. From my perspective, 60 msec to compress 2Kx2K image is faster than 191 msec for the same job. I also expect that to do the work more than twice faster will definetely require more processor resources.

Please find attached precompiled test application built with Intel C/C++ compiler and linked with IPP 5.3.

Regards,
Vladimir

Attachments: 

AttachmentSize
Download ijl_vs_ipp53.zip402.32 KB

Hi Vladimir,

Yes, it's nice I managed to reproduce the results.

1.
Ok, we'll stick to 5.3 untill 6.0 is official.
Your 5.3 app gave the same result as the 6.0.

2.
Ok, I was a bit unclear. It is faster as you say. However, we often decode several motion jpeg streams at once and whenever we decode more than one stream at once, we will not have the performance improvement since the cpu load doubles.
Anyway, we prefer IPP since it fully utilizes the cpu even when we decode only one stream, and also since the color conversion is imroved.

3.
The performance on the Pentium D system is the same with the 5.3 application I got from you...

I have one (perhaps last) problem!
I haven't been able to get the performance improvement with the 5.3 application I rebuilt from your 6.0 application. I'm not sure why, but one guess is that it's becaus I don't have "libiomp5mt.lib" and had to remove it from Additional dependencies for the linker. Could this be the case? If so, where can I get the library?
If not, what could else be the problem?
The application runs ok, but it only uses 50% cpu, so the performance is of course half compared to your application. Both on Pentium D and Core 2 duo systems.

Br,
Robert

vladimir-dudnik (Intel)'s picture

Hi Robert,

The libiomp5 is new Intel OpenMP run time library. It comes with Intel compiler starting from version 10.1. If you haveprevious version of Intel compiler you can replace it with libguide library. If you do not have Intel compiler at all, but do have MS VC2005, you can modify Makefile if such a way to enable OpenMP threading in JPEG codec (basically you need to add compiler option -openmp).

Regards,
Vladimir

Hi,

I have tried both with libiomp5 and the -openmp option, but neither helps.

Are there any differences between 5.3 and 6.0 projects? (I only have the 6.0 project, which I tried to make 5.3...).
Or is it easiest if I could have your 5.3 project as well?

Br,
Robert

vladimir-dudnik (Intel)'s picture

Robert,

to build jpeg.lib library I used IPP JPEGView sample. When you build it by build32.bat icl101it will enable OpenMP threading automatically in IPP 5.3 version. In IPP 6.0 beta version it also will enable OpenMP threading when you build it with VC2005.

Regards,
Vladimir

Ok, I rebuilt jpeg.lib and now it works fine!

Thank you very much for your patience in this matter Vladimir!

Br,
Robert

vladimir-dudnik (Intel)'s picture

You are welcome! Please let us know how do you find functionality and interface of IPP JPEG codec, any inconsistences you may find in it, we will be glad to further improve it.

Need to inform you, there was threading issue in IPP 5.3 JPEGencoder which was fixed in IPP 6.0 version. The issue can lead to corrupted JPEG stream generation when you repeatedly encode frame by frame with threaded version of JPEG encoder (there was lack of synchronisation between threads).

Vladimir

Login to leave a comment.