Forum Jump

Select Group :
Select Forum :
Sorted By :
Sort Order :
From The :
 
Thread Tools  Search this thread 
trebo
January 24, 2008 12:48 AM PST
ijl15 vs ijl20 vs IPP jpeg decode performance

Hi,

We have been using ijl15 for decoding jpeg images for quite a while.
We have now upgraded to IPP 5.3 and ijl20 and we are noticing a performance slowdown in decoding jpeg images.

The versions of the ijl are:
ijl15 - 1.5.4.36
ijl20 - 2.0.18.50

What we do is basically:

JPEG_CORE_PROPERTIES m_jcp;
BYTE* m_pData;
ijlInit(&m_jcp);
ijlRead(&m_jcp, IJL_JBUFF_READPARAMS);
m_pData = (BYTE*)ippMalloc(dwSize);
m_jcp.JPGBytes = pPicData;
m_jcp.JPGSizeBytes = dwPicDataSize;
m_jcp.DIBBytes = m_pData;
ijlRead(&m_jcp, IJL_JBUFF_READWHOLEIMAGE);

And this is the performance I get with exactly the same code for ijl15 and ijl20:

Using ijl15
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1125 ms
Average: 11.250000 ms per pic
Average: 1.255222 ms per 100.000 pixels
Average: 5.498104 ms per 100.000 bytes

Using ijl20
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1219 ms
Average: 12.190000 ms per pic
Average: 1.360102 ms per 100.000 pixels
Average: 5.957501 ms per 100.000 bytes


Is this a known issue? Can I do something to get better performance with ijl20?

Br,
Robert

 

Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
January 24, 2008 3:40 AM PST
Rate
 
#1

Hi Robert,

Could you please tell us what is your operating system and hardware platform you are working on?

By the way, IJL is depricated library. We now propose to move your code to IPP JPEG codec, which is part of JPEGView sample (in IPP image-codecs sample package). This IPP JPEG codec is the fastest JPEG codec, to achive its excellent performance it utilizes additional processor cores if available in system, it also provide support for lossless mode operations (both encoding and decoding) and support 16-bit per color channel images.

Regards,
  Vladimir



trebo
January 24, 2008 3:57 AM PST
Rate
 
#2 Reply to #1

Hi Vladimir,

Thanks for your quick answer!

OS:
MS Win XP Pro V.2002, SP 2 (5.1 build 2600)

Platform:
Intel Pentium D CPU 3.40 GHz (2 CPUs), 2 GB RAM


We will investigate using IPP JPEG codec as well!

Br,
Robert



trebo
January 24, 2008 7:34 AM PST
Rate
 
#3 Reply to #2

Hi again,

We have now done some testing with the IPP JPEG codec. However, we have not been able to improve performance. On the contrary it got slightly worse!

Using IPP JPEG codec version 5.3.1.064
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1375 ms
Average: 13.750000 ms per pic

Compare this to the ijl results:

Using ijl15
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1266 ms
Average: 12.660000 ms per pic

Using ijl20
0.896MP-0.20MB-1152x778.JPG
Reading 100 JPEG pics in: 1344 ms
Average: 13.440000 ms per pic


The code is taken from the sample app JPEGView and looks like follows:

void GetImageFromJPEG()
{
CIppImage m_image;
PARAMS_JPEG m_param_jpeg;
Ipp8u* pJPEG = 0;
int JPEGSize;
JERRCODE jerr;
CMemBuffInput in;
CFile jpeg;

printf("Using IPP JPEG codec version 5.3.1.064 ");
printf(
"0.896MP-0.20MB-1152x778.JPG ");
jpeg.Open(
"Pics\0.896MP-0.20MB-1152x778.JPG", CFile::modeRead|CFile::typeBinary);

JPEGSize = (int)jpeg.GetLength();
pJPEG = (Ipp8u*)ippMalloc(JPEGSize);
jpeg.Read(pJPEG,JPEGSize);
jpeg.Close();
jerr = in.Open(pJPEG,JPEGSize);
m_image.Color(JC_UNKNOWN);

int iterations = 100;
DWORD dwStartTick = GetTickCount();
long pos = 0;

for(int i=0; i<iterations; i++)
{
    jerr = ReadImageJPEG(&in,&m_param_jpeg,&m_image);
    in.TellPos(&pos);
    jerr = in.Seek(pos*-1, 1);
}

DWORD dwEndTick = GetTickCount();
printf(
"Reading %d JPEG pics in: %lu ms ", iterations, dwEndTick - dwStartTick);
printf(
"Average: %f ms per pic ", (double)(dwEndTick-dwStartTick) / iterations);
}

Any ideas of why we don't get better performance with IPP JPEG codec than with ijl15 and ijl20?
As stated above, we are using IPP version 5.3.1.064

Br,
Robert



Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
January 24, 2008 12:22 PM PST
Rate
 
#4 Reply to #3

That look strange.

For ReadImageFromJPEG() you need to initialize m_param_jpeg struct, there are some parameters which control decoder behaviour.

If you link with IPP static libraries, please make sure you call ippStaticInit function at the beginning of your application (it is not necessary in case of DLLs).

Vladimir



trebo
January 25, 2008 5:42 AM PST
Rate
 
#5 Reply to #4

I have tried with different values for m_param_jpeg. now. However, the only parameter improving the speed is m_param_jpeg.dct_scale.

When I increase this from JD_1_1 it gets faster:
JD_1_1 - 1750 ms
JD_1_2 - 1620 ms
JD_1_4 - 1320 ms
JD_1_8 - 1030 ms

With ijl15 it takes about 12 ms to decode one image, but then this is still full size and not scaled down as the case is when JD differs from 1_1.

And I am using DLLs, so this shouldn't be the matter either...

Any other suggestions?

Br,
Robert



Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
January 25, 2008 6:02 AM PST
Rate
 
#6 Reply to #5

Hi Robert,

could you please attach here problem image files which demonstrate that issue? Our testing show that IPP JPEG codec outperform old IJL library on Core2 system with different factors, depending on image compression mode.

Regards,
  Vladimir



trebo
January 28, 2008 4:47 AM PST
Rate
 
#7 Reply to #6

Hi Vladimir,

Thanks for looking into this issue. We really appreciate it!

I've attached the picture with which we've done the testing.

Br,
Robert



 Attachments 
trebo
February 20, 2008 4:10 AM PST
Rate
 
#8 Reply to #7

Hi again,

Did you have any chance to look into the issue?

Do you need any other information from me?

Br,
Robert



Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
February 20, 2008 6:38 AM PST
Rate
 
#9 Reply to #8

Hi Robert,

we were not able to reproduce performance issue with your image. Could you please provide your test project to see what can be the reason? 

Regards,
  Vladimir



trebo
February 21, 2008 6:25 AM PST
Rate
 
#10 Reply to #9

Hi Vladimir,

I attach my test projects for you to look into.

The project "IPP-JPEGdecode" uses IPP 5.3 JPEG codec.
I have also provided the executable: IPP-JPEGdecode.exe.

The project "IPP-IJL15-20-compare" uses IJL15 or IJL20.
I have also provided the executables: IPP-IJL15.exe and IPP-IJL20.exe

I have also provided the test image.
I have not provided any Intel files. If you need any of them just let me know.

I have tested on three different computers with about the same setup:
OS: MS Win XP Pro V.2002, SP 2 (5.1 build 2600)
Platform: Intel Pentium D CPU 3.40 GHz (2 CPUs), 2 GB RAM

The result is the same on all machines. IJL15 is fastest and JPEG codec is slowest:
IJL15: 13.5 ms per picture (50% CPU usage, 100% of one core)
IJL20: 14.5 ms per picture (50% CPU usage, 100% of one core)
JPEG codec: 20 ms per picture (100% CPU usage, 100% of each core)

So, in my test "JPEG codec" is actually much slower since it uses 100% of two cores, while IJL only uses 100% of one core...!

Also, the colors is inverted or something in the images produced by the IPP JPEG Codec. Maybe this is a hint to what's wrong!?

Please let me know if you need anything else from me!

Br,
Robert



 Attachments 
Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
February 26, 2008 12:13 AM PST
Rate
 
#11 Reply to #10

Hi Robert,

please note that IPP JPEG codec (part of JPEGView sample) use OpenMP threading. To enable that you need to compile it with Intel C/C++ compiler with sample's build script or you need to specify /Qopenmp option for ICL or /openmp option for VC2005 compilers if you use VC studio project.

The second point, it is better to link codec with IPP static libraries.

With all these conditions our test show that IPP JPEG codec is the fastest codec between (IJL 1.5, IJL 2.0 and IJG JPEG codecs).

Regards,
  Vladimir



trebo
February 27, 2008 2:25 AM PST
Rate
 
#12 Reply to #11

Thanks for your support and patience in this matter Vladimir!

I'm using VC2005, and I have set the Language option OpenMP Support to Yes.
I have also tried with static library linking. But I don't get any better performance than previously stated.

So, I also tried with the JPEGView sample. I made this modification in JPGViewDoc.cpp:

DWORD dwStartTick = GetTickCount();
long pos = 0;
for(int i=0; i<100; i++)
{
  jerr = ReadImageJPEG(&in,&m_param_jpeg,&m_image);
  in.TellPos(&pos);
  jerr = in.Seek(pos*-1, 1);
}
string.Format(
"Decoded 100 jpeg pics in: %lu ms", GetTickCount()-dwStartTick);
AfxMessageBox(string,MB_OK);

To get the time it takes to decode 100 pictures. I built the application with the original supplied build32.bat and Makefile. Still, the performance is worse compared to IJL!

This is the results, in milliseconds it takes to decode one picture - Ijl15 fastest and JpegView slowest:

Picture              Ijl15             Ijl20        IPP 5.3   JpegView
640x480 4,3 5 6,25 6,4
1152x778 12,9 13,6 18,5 19,2

Isn't the sample application optimized?
What can be wrong?
How do we solve this in the easiest way?

Br,
Robert



bmadiraj
March 12, 2008 12:03 AM PDT
Rate
 
#13 Reply to #1

Hi,

     I am working on an application which compresses the Video memory buffer using JPEG's ijl20 library. It works perfectly fine with the 24-bit buffer generated by my Display driver. To meet a new requirement, I am creating a 16-bit display driver. But, JPEG compression fails during ijlWrite throwing error either of type IJL_UNSUPPORTED_SUBSAMPLING or IJL_INVALID_JPEG_PROPERTIES. I happened to see at various places on the net that old JPEG doesn't support 16-bit channel whereas the new IPP JPEG 2000 codec does support it. I would be really grateful if u clarify this doubt of mine that whether ijl20 has support for 16 bit channel or not.

Regards,

Bhavani Madiraju

 

 



jggirish
March 13, 2008 11:29 PM PDT
Rate
 
#14 Reply to #10

Hi

if i run the same code (intel zip), it is showing

error when decoding jpeg data-buffer too small

what is the reason.



Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
March 20, 2008 3:43 PM PDT
Rate
 
#15 Reply to #14

Talking about 16-bit images people can mean different things. Do you mean YUY2 format or do you mean RGB565 format? Or you mean 16-bit per color channel format (some medical images use that)?

I would recommedn you to migrate from IJL to IPP JPEG codec found in JPEGView sample. This code supports 16-bit per color channel lossless compression and also support YUY2 format (specified as JC_YCBCR and JS_422)

Regards,
  Vladimir



Igor Levicki
Total Points:
10,855
Status Points:
10,855
Black Belt
March 20, 2008 6:39 PM PDT
Rate
 
#16 Reply to #15

Vladimir, what is the reason for not including that JPEG code in the IPP itself if it is so great and fast as you say?

Another point to consider is that some people still have requirements for some projects to use only plain C code which is not possible if you implement the JPEG codec as C++ class.

Finally, I have noticed that IPP JPEG decompression speed is poor for some particular images while applications based on IJG code do not exhibit the same slowdown.

I really think you have to perform more testing with images created by different JPEG compressors and optimizers and that you should also evaluate performance on older CPUs (at least Pentium D).


--------
If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.


Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
March 21, 2008 1:42 AM PDT
Rate
 
#17 Reply to #16

Hi Igor,

What do you mean under "including that JPEG code in the IPP itself'? There are two different things. The first one is binaries of optimized low-level libraries which provide C interface. It is IPP libraries. And these IPP libraries itself does not contain any high level components like codecs, file readers, renders and so on. And the second thing is set of IPP samples, which are available in source code and demostrate how you can implement some high-level components, like codecs and how you can build application which combines all this stuff to get kind of fiinal solution, for exampe, image viewer application.

So, all JPEG codecs, available within IPP samples (IJG, IJL and IPP JPEG codec) are actually build on top of the same IPP libraries. Yes, the different codecs may use the different set of IPP low-level functions, they may have different architecture and set of features.

That's correct, additional wrapper is required for C++ based codec for those who need pure C interface. But please note, that the industry trend is to move on C++ from C and personally I think it will continue in the future.

Thanks for reporting on your findings, could you please attach sample image which cause performance issue you mention? That would help us to reproduce and investigate the issue.

Please be noted, that color convertion functions used in the original IJL library are not so precise as the counterparts we developed in IPP. The IJL functions use 8-bit fixed point precision for YCbCr to RGB convertion whereas the IPP JPEG color convertion functions utilize at least 14-bit. Other source of accuracy lost is IDCT operation. In IPP we have quite high precision IDCT function. You may use simple test to see the difference in accuracy of JPEG decoding between the original IJL and IPP JPEG codecs:
1. choose some reference not compressed image, say test.bmp
2. compress it with reference JPEG encoder (you may use the original IJG cjpeg utility)
3. decompress it to BMP with original IJL codec
4. decompress it to BMP with any IPP codec
5. calculate the absolute difference with formulae like diff_ijl.bmp[i,j] = abs(test.bmp[i,j] - ijl.bmp[i,j]) and diff_ipp.bmp[i,j] = abs(test.bmp[i,j] - ipp.bmp[i,j]).

Then you can see in which case the absolute difference is higher.

Regards,
  Vladimir



Igor Levicki
Total Points:
10,855
Status Points:
10,855
Black Belt
March 21, 2008 9:50 AM PDT
Rate
 
#18 Reply to #17

Vladimir, I will see if I can find that particular image for you and attach it.

As for industry trend, I agree that we are moving towards C++ but C still has its place in embedded systems and in systems which have to interface with other code wrtten in assembler or some other language.

As for the IPP, I find the lack of complete JPEG codec implementation a bit disturbing. As you say, there are several ways to skin the cat (several samples) but neither one is a complete solution. It is ok to offer low-level interface for those who need specialized features but implementing high-level interface in the library itself is equally important in my opinon.

For example, most applications need to decode whole image at once from the source buffer into the destination buffer to be able to display it. Why not also provide high-level API which does just that so that they don't have to keep reimplementing it?

When we are at it, does IPP support decoding embedded color profiles and applying those on the decoded image?


--------
If you find my post helpfull, please rate it and/or select it as a best answer where applies. Thank you.


trebo
March 31, 2008 8:26 AM PDT
Rate
 
#19 Reply to #18

Hi Vladimir,

As I mentioned in a previous reply (02-27-2008, 2:25 AM), I have tested the JPEG decode performance with the JPEGView sample - using the original supplied build32.bat and Makefile. Still is the JPEG codec performing worse than IJL 15 and 20.

How come?
Shouldn't the use of the original supplied build32.bat and Makefile ensure that JPEG codec is used in an optimal way and according to you be faster than both IJL 15 and 20?

 



Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
March 31, 2008 3:21 PM PDT
Rate
 
#20 Reply to #19

Hi Robert,

We still not able to reproduce that with your test application (I do not have Pentium D in hand so was using Core 2 Duo system).

You are correct, original build script should provide the results like we published in IJL-IPP sample's documentation (on similar system of course). Although we did use Intel C/C++ compiler to test for performance.

Regards,
  Vladimir



trebo
May 7, 2008 2:30 AM PDT
Rate
 
#21 Reply to #20

Hi Vladimir,

I got my hands on a Core 2 duo system, and I still get best performance with IJL15 and worst with IPP:
Milliseconds to decode picture 1152x778 pixels:
IJL15     IJL20     IPP
11.7       12.8       14.5

Even if we disregard from my own testprogram, I get the same result with your sample applications!
I have tested this application: Ipp5.3.2ipp-samplesimage-codecsjpeg-ijlinwin32_cl8jpgview.exe
Which I suppose uses IJL20.
And this application: Ipp5.3.2ipp-samplesimage-codecsjpegviewinwin32_cl8jpgview.exe
Which I suppose uses IPP decoder.
The test results are as follows on a Core 2 Duo system with Vista Business 6.0 build 6000:
IJL20           IPP
12.8-15.8     13.7-31,7    (milliseconds to decode picture 1152x778 pixels - read from 'USEC' in the status bar in the program)

We clearly see that IJL20 is faster with 12.8 ms compared to 13.7 ms for IPP.
We also se that IPP has a much bigger span, highest value 31.7 ms compared to 15.8 ms for IJL20.
How come?
What are your test results if you compare both applications?

If IPP still is the fastest for you, can you please provide me with test applications where IPP is faster than IJL, so I can try this on my Core 2 duo system?

Br,
Robert

 



Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
May 7, 2008 5:19 PM PDT
Rate
 
#22 Reply to #21

Hi Robert,

I've attached test program which I use this time (to rebuild it you will need old IJL library, we do not distribute them anymore). Precompiled executable is located in Release folder. If you will specify no parameters then generated image will be used for testing otherwise you need to specify valid name of BMP file (24-bit per pixel)

Regards,
  Vladimir



 Attachments 
trebo
May 8, 2008 8:59 AM PDT
Rate
 
#23 Reply to #22

Vladimir, thanks for the test program!

I have tested it now, and as you say, it shows better performance with IPP than IJL15 on a Core 2 Duo system.
However, there are more to consider!

1.
You are using IPP 6.0.82.530.
We are using IPP 5.3.85.467, which is the latest version released to us.
IPP 5.3 and IJL15 has about the same performance on Core 2 Duo, and IJL15 is faster than IPP 5.3 on Pentium D!
How come you are comparing with 6.0, when the latest released is 5.3.2?
Why haven't this been mentioned?

2.
CPU usage.
It's a fact that IPP 6.0 is faster than IJL15 and IPP 5.3 on Core 2 duo. But it also doubles the total CPU usage from 50% to 100%. Since IPP 6.0 is twice as fast, it actually isn't faster at all if the CPU usage also is considered!

3.
Pentium D.
On my pentium D machine I have the following results with a 2880x1944 image:
IJL15:    116 ms
IPP 5.3: 145 ms
IPP 6.0: 140 ms
IJL15 is clearly fastest.
IPP 5.3 is actualy second best, since it is only slightly slower than IPP 6.0, while only consuming 50% CPU. IPP 6.0 has 100% CPU usage.


Considering these three facts, I really can't see any performance improvement with IPP compared to IJL15, neither IPP 5.3 nor IPP 6.0 and neither on Pentium D nor Core 2 duo systems.
What are your comments on this? Is there more to be considered?

Br,
Robert

 



Vladimir Dudnik (Intel)
Total Points:
26,150
Status Points:
26,150
Black Belt
May 8, 2008 4:46 PM PDT
Rate
 
#24 Reply to #23

Hi Robert,

So, basically you were able to reproduce the results which I have on my system (IPP JPEG is faster then the old IJL library). That's good.

1. IPP 6.0 beta just was published, you can register and download it from IPP main page. But just in case, I also attached the same pre-built application linked with IPP 5.3. Please try it and let us know what is results on your system. On our side it shows that IPP outperform IJL just like IPP 6.0 beta did in previous application.

2. "...it actually isn't faster at all if the CPU usage also is considered!". Probably there is some disperance in terms. We call something is faster when it can do more for the same amount of time. It says nothing on how calculation intensive it will to make the things faster.

3. Unfortunately, I do not have Pentium D system in hands, so can't test it. By the way, one guess I just get - IJL was compiled with Intel C/C++ compiler, whereas my application attached in the previous post was compiled with VC2005, that might be one of the reasons for worse performance. The second reason is as I already said somewhere in this thread that we increase arithmetic precision in color conversion functions in IPP because of many customers complain on relatevely big rounding errors in IJL. That cost us some performance. You may compare PSNR for IJL and IPP JPEG codec.

Taking all of that into account I see that at least on Core2 system (where I can run this test) IPP do the work for 60 msec (compression) and 57 msec (decompression) while IJL do for 191 msec and 93 msec accordingly. From my perspective, 60 msec to compress 2Kx2K image is faster than 191 msec for the same job. I also expect that to do the work more than twice faster will definetely require more processor resources.

Please find attached precompiled test application built with Intel C/C++ compiler and linked with IPP 5.3.

Regards,
  Vladimir



 Attachments 
trebo
May 9, 2008 3:37 AM PDT
Rate
 
#25 Reply to #24

Hi Vladimir,

Yes, it's nice I managed to reproduce the results.

1.
Ok, we'll stick to 5.3 untill 6.0 is official.
Your 5.3 app gave the same result as the 6.0.

2.
Ok, I was a bit unclear. It is faster as you say. However, we often decode several motion jpeg streams at once and whenever we decode more than one stream at once, we will not have the performance improvement since the cpu load doubles.
Anyway, we prefer IPP since it fully utilizes the cpu even when we decode only one stream, and also since the color conversion is imroved.

3.
The performance on the Pentium D system is the same with the 5.3 application I got from you...


I have one (perhaps last) problem!
I haven't been able to get the performance improvement with the 5.3 application I rebuilt from your 6.0 application. I'm not sure why, but one guess is that it's becaus I don't have "libiomp5mt.lib" and had to remove it from Additional dependencies for the linker. Could this be the case? If so, where can I get the library?
If not, what could else be the problem?
The application runs ok, but it only uses 50% cpu, so the performance is of course half compared to your application. Both on Pentium D and Core 2 duo systems.

Br,
Robert





Intel Software Network Forums Statistics

8290 users have contributed to 31236 threads and 99111 posts to date.
In the past 24 hours, we have 7 new thread(s) 19 new posts(s), and 24 new user(s).

In the past 3 days, the most popular thread for everyone has been comparison cilk++, openmp, pthreads first results The most posts were made to comparison cilk++, openmp, pthreads first results The post with the most views is Very amusing...  Escalated as

Please welcome our newest member zq.x