Tai Ha, Intel Corporation
Chao Yu, Intel Corporation
NHN Corporation is Korea's premier Internet company, operating the nation's top search portal, Naver (www.naver.com), the leading online game portal, Hangame (www.hangame.com) the nation's largest children's portal, Jr.Naver (jr.naver.com) and the first online donation portal, Happybean (happybean.naver.com)
Starting from the business pillars of search and games, NHN has rolled out a wide range of innovative and convenient online services. A number of surveys demonstrate that the company is regarded as an undisputed leader in the online services industry worldwide.
NHN has emerged as Korea's largest Internet Company in terms of net profit. This outstanding growth is largely attributable to the company's efforts to create and introduce stable revenue streams in its core services; for instance, search-oriented advertising services and fee-based games.
NHN outgrew the confines of Korea's borders. NHN built a good reputation in Japan and China, and also set up NHN USA, forging towards becoming a world-class internet company.
Internet portal sites such as Naver generally use the thumbnail image, which is a reduced-size version of an original image, throughout their services. Naver has three processes to create a thumbnail image; decoding original images, resizing, and encoding to a thumbnail image. In this process, the ImageMagick*, which is famous for Linux*-based image process library, is broadly used.
The latest CPU provides the newest Streaming SIMD Extensions (SSE, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2, and Intel® AVX) instruction set to accelerate the signal and multimedia data processing.
The NHN Performance Engineering team develops the thumbnail creation library "libNthumb", which uses the SIMD instruction set through Intel® Integrated Performance Primitives (Intel® IPP). Intel® IPP provides optimized software building blocks for multimedia, data and image processing application. Intel® IPP is available as a component of Intel® Parallel Studio, Intel® System Studio, and Intel® INDE products.
In this paper, we will show the performance benefit of libNthumb, and the technique to improve the performance on resizing JPEG images.
ImageMagick, which is a widely-used open source library, has functions to read, write and convert in more than 100 image formats. It provides various types of API according to the abstraction level as well as command-line tools. It is taken as the performance baseline.
ImageMagic uses the system JPEG encoder and decoder library (for example, /usr/lib64/libjpeg.so for 64 bit application) to process JPEG library. Often, the default JPEG library is not high optimized.
libNthumb, which is the library specialized in thumbnail image creation, has the following functions.
- Fast jpeg resizing
- Using Inverse Discrete Cosine Transform (IDCT) scale factor
- SSE instruction set via Intel® IPP
- Multi-step resizing
- Sharpen filter
- Lossless auto rotation
- Metadata removal
- Cropping with Rectangle Regions of Interest (ROI)
libNthumb uses the encoders/decoders from both Intel® IPP JPEG sample code and ImageMagick. The Intel® IPP JPEG encoder/decoder is used to adjust IDCT scale factor and the ImageMagick encoder/decoder is used to support various image formats and recover partially-damaged images.
libNthumb uses the Intel® IPP because it improves the performance of data stream operation by utilizing SIMD instruction set.
libNthumb delivers additional performance boost when creating thumbnail images from jpeg image files. A jpeg image file has several decoding steps and the IDCT process is one of them. In the IDCT process, we get a resized image by adjusting a scale factor. The resized image reduces the size of data set for next operations and it improves the performance.
The followings describe test environment for both hardware and software.
- Test system: Intel(R) Core(TM) i7-2600K CPU, 3.40GHz, 64G memory, Red Hat Enterprise Linux Server release 6.0 (Santiago), kernel 2.6.32-71.el6.x86_64,
- Software libraries:
- ImageMagick: version 6.9.0-0
- Intel® IPP: version 8.2.0.090, and IPP sample code 7.1.1.013
- System JPEG library: libjpeg.so.62.0.0
The benchmark repeatedly runs a transaction and the transaction consists of decoding, resizing and encoding for the below image.
The test data is 12 Mega-pixel JPEG Image (4000 x 3000). It will be resized to a 400 x 300 thumbnail image. The code run with one thread, and it includes the following performance data:
- Elapsed time for decoding
- Elapsed time for resizing
- Elapsed time for encoding
- Total elapsed time (decoding + resizing + encoding time)
When resizing with ImageMagick, the code uses the default settings, and chooses LanczosFilter for image resize:
MagickReadImage(magick_wand, InpuJPEGFile); // Read file, JPEG decoding
JMagickResizeImage(magick_wand,width,height,LanczosFilter,1.0); //image resizing
MagickWriteImages(magick_wand,OutputJPEGFile,MagickTrue); //JPEG encoding, save file
Intel® IPP functions image resize function choose Lanczos interpolation with Antialiasing feature:
ippiResizeAntialiasingLanczosInit(srcSize, dstSize, 3….)
ippiResizeAntialiasing_8u_C3R(&(pSrc, srcStep, pDst, dstStep,…..)
2. Performance with IDCT scaling
The figure below shows the JPEG decoding, image resizing, encoding, and the total time to generate the thumbnail images. libNthumb with IDCT scaling shows about 13.9x performance gain over ImageMagick.
For the elapsed time for the decoding step, libNthumb shows 6.6X performance gain over ImageMagick. libNthumb used the optimized Intel® IPP libraries, which can achieve high performance for JPEG decoding. Meanwhile, with the IDCT scale, libNthumb only needs to process less data, which can help further performance improvement. For the resizing step, libNthumb achieves much better performance, because it only needs to process the small data set due to the IDCT scaling after the decoding process.
Encoding time is relatively shorter than decoding time because the data set is reduced after the resizing step. libNthumb has better performance over ImageMagick because the optimized Intel® IPP JPEG encoding code.
3. Performance without IDCT scaling
The figure below shows average performance on JPEG decoding/encoding, image resizing without IDCT scaling. This is a variant of the libNthumb that removes IDCT scale factor feature out of the jpeg resizing. The reason for the testing is to measure performance gains that Intel® IPP contributes to. The performance difference shows the performance gain only through Intel® IPP. Similarly, the performance difference between this test and pervious libNthumb test shows the performance gain through IDCT scaling.
The total time to generate thumbnail image without IDCT scaling shows about 2.44x performance gain over ImageMagick. For the elapsed time for the decoding step, libNthumb without IDCT scaling shows 1.68X performance gain. For the resizing step, libNthumb shows 3.6x gain, and the encoding time is about 2.1x faster.
Performance Improvement Factors
As seen in previous sections, libNthumb performance is far better than mageMagick in creating thumbnail images. There are two main factors to improve the performance of libNthumb.
- using SIMD instruction with Intel® IPP
- using IDCT scale factor
1. SIMD instructions with Intel® IPP
Intel® Streaming SIMD Extensions is a new set of Single Instruction Multiple Data (SIMD) instructions designed to improve the performance of various applications. They are available on Intel® and Intel® compatible processors. One SIMD instruction can process several data elements at the same time. For example, a SSE2 instruction can compute two 64bit integer data, or four 32bit integer data concurrently, shown below.
Intel® IPP functions are designed to deliver performance by matching the function algorithms to low-level optimizations based on the processor's available features such as Streaming SIMD Extensions and other optimized instruction sets.
The libNthumb uses Intel® IPP JPEG encoding sample code to complete the JPEG encoding and decoding transform. Intel® IPP has optimized the key algorithmic components for JPEG Codec, shown below.
Besides the JPEG encoding and decoding function, the libNthumb takes advantage of the Intel® IPP image processing resize functions. The resizing process is the loop of calculation on image pixels. Each pixel includes 24 bit data (R,G,B each 8 bits). By utilizing Intel® IPP, which is optimized by SIMD instructions, resizing process is well performed in libNthumb.
The picture below is a result of profiling CPU cycles for each step in batch process test. We can see significant performance improvement, especially in the image resizing part. This is because of two reasons: 1) the resize function uses optimized Intel® IPP functions 2) by controlling IDCT scale factor in the JPEG decoding, libNthumb only resize a contracted image.
2. IDCT scale factor
JPEG image data can be resized by adjusting block size during IDCT process. libNthumb improves the performance by utilizing this IDCT process. It scales down the image size to be as close as thumbnail image size during IDCT process. So there is reduced and decoded image before resizing process.
The picture above is a diagram for each process of JPEG resize where the JPEG image is reduced by 1/10 size.
ImageMagick does not change IDCT scale factor in decoding process. Therefore, the result image of decoding has the same size as the original image, while libNthumb obtained the result image which is reduced by 1/8 size by handling IDCT scale factor in decoding process.
Because the resizing process is computational operation, the resizing process with reduced image costs less than one with full-size image.
The thumbnail image should have a certain level of quality. When it comes to the quality of thumbnail image from libNthumb, the quality difference from ImageMagick is invisible to the naked eye. The below pictures are thumbnail images generated by ImageMagick and libNthumb, respectively.
<Thumbnail Image by ImageMagick>
<Thumbnail Image by libNthumb>
There are various methods of resizing. Image quality will differ depending on the filter used. libNthumb improves image quality through multi-level resizing and sharpening filters, each having a different look and feel.
libNthumb is the performance-optimized library to focus on thumbnail image creation. It also provides additional features that are useful for thumbnail image creation in Internet portal services. For example, the auto rotation feature supports the EXIF format, which is a metadata used for image rotation in digital camera and is not supported by most web browsers. It also has metadata retention and removal features. It improves the performance by using Intel® IPP library for utilizing SSE instructions and IDCT scale factor.