UIC Encoding/Decoding JPEG images - very slow

UIC Encoding/Decoding JPEG images - very slow

Hello,I am using the UIC samples to decode JPEG images and it seems to be too slow. It takes about 3 sec to decode my test image and I can decode it in about 0.8 sec using FreeImage ( http://freeimage.sourceforge.net/ -- which itself is using libjpeg I believe).Here is the output of the test program for uic. Note that I changed the code a bit to get the 'real' user timing (in bold below). The low-level routine pretends to decode in 209 msec but from the high level call in the demo program, it really takes 3 sec :$ ./uic_transcoder_con -t 1 -i test.jpg -o out.jpgIntel Integrated Performance Primitives version: 7.0 build 205.105, [7.0.1077.205] name: libippjy8.so.7.0+ date: Apr 8 2012Decode using ftime : 2968 msecimage: test.jpg, 3646x5470x3, 8-bits unsigned, color: RGB, sampling: 444decode time: 209.44 msecEncode using ftime : 3262 msecencode time: 465.57 msecAny idea of what I could be doing wrong? I expect that decoding with UIC would be at least as fast as FreeImage.I am using :composer : Composer_2011.11.339 with IPP 7.0.7ipp_samples: l_ipp-samples_p_7.0.7.049Thanks!Gilbert

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Gilbert,Could you try your operation in a single-thread mode (i.e. use "-n 1" option)?As far as I remember there was an issue on Linux with measuring time by using standard "time.h" functions. These functions return so called "process time", which is sum of all thread times. For example, if the piece of the code is executed in 2 parallel threads and each of them takes 0.5 sec - i.e. the whole piece finishes in 0.5 sec by wall clock - the Linux's time measuring functions will return the value of 1 sec.If my assumption is correct, then when you use "-n 1" you will get both times (returned by uic_transcoder and by your measurements) about the same. If not, we'll be investigating the issue. Hope, you use multi-core CPU )).Regards,Sergey

Hi Sergey,
I tried -n 1 and it idoes not change anything. The time reported is still ~240 msec but the real time is ~3 sec..The real time is 3 sec as confirmed with the external call to ftime() before and after the high level decode routine..The computer under which I run this has 8 cpus and I compiled the sampled with the script build_intel64.sh ..Can I do something to speed it up? FreeImage can decode the image in about 0.8 sec (real user time). So I expect that I should be able to get at lest that with UIC. Being almost 4 times slower, I must be doing something wrong but I cannot find it..$ ldd ./uic_transcoder_con /usr/java/jdk1.6.0_16/jre/lib/amd64/libjsig.so (0x00002adfcf199000) libuic_core.so => ./libuic_core.so (0x00002adfcf29c000) libuic_io.so => ./libuic_io.so (0x00002adfcf4a2000) libuic_bmp.so => ./libuic_bmp.so (0x00002adfcf6aa000) libuic_pnm.so => ./libuic_pnm.so (0x00002adfcf8af000) libuic_jpeg.so => ./libuic_jpeg.so (0x00002adfcfac1000) libuic_jpeg2000.so => ./libuic_jpeg2000.so (0x00002adfcfe40000) libuic_dds.so => ./libuic_dds.so (0x00002adfd00fc000) libuic_png.so => ./libuic_png.so (0x00002adfd030f000) libuic_tiff.so => ./libuic_tiff.so (0x00002adfd0552000) libuic_jpegxr.so => ./libuic_jpegxr.so (0x00002adfd0759000) libippch.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippch.so.7.0 (0x00002adfd09b1000) libippdc.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippdc.so.7.0 (0x00002adfd0ab8000) libippcc.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippcc.so.7.0 (0x00002adfd0bc3000) libippcv.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippcv.so.7.0 (0x00002adfd0ce0000) libippj.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippj.so.7.0 (0x00002adfd0e05000) libippi.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippi.so.7.0 (0x00002adfd0f19000) libipps.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libipps.so.7.0 (0x00002adfd10ca000) libippcore.so.7.0 => /opt/intel/composerxe/ipp//lib/intel64/libippcore.so.7.0 (0x00002adfd1233000) libiomp5.so => /opt/intel/composerxe/lib/intel64/libiomp5.so (0x00002adfd134c000) libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e29600000) libm.so.6 => /lib64/libm.so.6 (0x0000003e28600000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003e6fe00000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003e6e200000) libc.so.6 => /lib64/libc.so.6 (0x0000003e27e00000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003e28200000) libtbb.so.2 => /usr/lib64/libtbb.so.2 (0x00002adfd1648000) /lib64/ld-linux-x86-64.so.2 (0x0000003e27a00000) librt.so.1 => /lib64/librt.so.1 (0x0000003e29e00000).$ cat /proc/versionLinux version 2.6.18-128.7.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Aug 24 08:21:56 EDT 2009.Thanks.Gilbert

Here is an update: before UIC, I was using a wrapper around the old IPP samples to decode a jpeg . Those files are more or less in ipp-samples\realistic-rendering\3d-viewer\jpegcodec\* now..With the code in these files, I do get a significant improvement to decode : about 200 msec compared to 800 with freeimage..Any idea how to get that with UIC? UIC is nicer because it supports more format and JPEG in CMYK..Thanks.Gilbert

I just found the the CTimer object at the beginning of the DecodeImage() routine takes a long time to create. I have removed it and now the DecodeImage routine is blazing fast as expected, about 4 times faster then FreeImage.I am not familiar with CTimer and I dont know whether it is a known problem or not but I can certainly live without it (ftime() works great with no overhead).So, if like me you want to wrap the UIC library into your own project and use the DecodeImage() and EncodeImage() high level functions, you should remove the CTimer object from these routines (unless this problem is specific to my installation).My problem is solved.Gilbert

Hi Gilbert,Nice to hear that the problem is resolved.Yes, you're right about CTimer::Init.It is not a "known problem", but specifics we had to notify. On Linux, CTimer::Init calls 'ippGetCpuFreqMhz' (you can see from timer.cpp source file). This - ippGetCpuFreqMhz - function directly measures CPU frequency by counting CPU clocks and this measurement takes about 3 seconds. On Windows there is no problem like this, because Init function reads frequency from system performance counters.I will speak with information development team to add notification about this to ippGetCpuFreqMhz description, will add notification to uic_transcoder_con description (and its console output) and, probably, will modify uic_transcoder_con source code to avoid operations with timer if no timing is asked in command line options.Thank you again,Sergey

Hi Sergey,In my own code, I always used ftime() directly to get the real user time. It has no overhead. This is what i use now in my copy of ipp samples now. I am not usre about the advantage of Ctimer, is it better at measuring the CPU time in a specific thread ?ThanksGilbert

Gilbert,Ftime() must be ok, since it shows astronomical (absolute, wallclock) time, though it may be not precise enough. In our measurements we try to use CPU clocks, because a) the time intervals we need to measure are usually shorter and b) these values to some extent are frequency-invariants (mostly depending on CPU architecture).There's another Linux function - clock() - which returns total process time, including all childs of process (i.e. parallel threads). This function must be used carefully.Regards,Sergey

If you are hoping to support CMYK images using UIC with no effort on your part then get ready for a (not so pleasant) surprise. In order to correctly display colors from CMYK JPEG you will have to use color management which UIC does not provide meaning that you must use external library (such as LittleCMS) which will significantly slow down your decoding.

Leave a Comment

Please sign in to add a comment. Not a member? Join today