Optimizing H.265/HEVC Decoder on Intel® Atom™ Processor-Based Platforms

Introduction

Watching video is the top usage for mobile devices. Multimedia processing is computing intensive usage and has a big impact on battery life and user experience. The LCD resolution on mobile devices has got better, from 480p to 720p, to now 1080p. End users want to watch high quality videos, but for online video providers, such as Youku, iQiyi and LeTV, purchasing the network bandwidth becomes increasingly expensive every year.

H.265/HEVC (High-Efficiency Video Coding), introduced last year, is the latest video codec standard developed by ISO / IEC and ITU-T.  H.265/HEVC doubles the compression ratio compared to the previous H.264/AVC standard, but has the same subjective quality. HEVC technology helps online video providers to provide high-quality video with lesser bandwidth, making it the next video codec revolution.

Android has a multimedia API, but ISVs often find it difficult to use and it doesn’t always meet their requirements. I will show a case study about how to optimize the H.265/ HEVC decoder for Intel® Atom™ processor-based platforms using YASM Modular Assembler and several Intel® software tools to obtain even better performance.

Android multimedia codec introduction

As we know, Android provides a standard multimedia player interface for developers to play video in the Java* layer. Media codec was introduced from Android 4.1 version. Developers can use media codec APIs to customize their players in the Java* layer. The Android multimedia workflow is shown below:

Figure 1. Android* multimedia workflow

MediaPlayerService will choose to use AwesomePlayer or Nuplayer based on video data source and format. AwesomePlayer supports local “fd://“ files and some “http://” URL-integrated streaming. NuPlayer was introduced in Android 4.0 to support streaming, which mainly supports “RTSP://” URLs and some “http://” (m3u8) segmented videos.

AwesomePlayer is based on the TimedEventQueue model, while Nuplayer is based on ALooper, AHandler, and Amessage. So Nuplayer can provide quick response for playing streaming video. Acodec was introduced to support Nuplayer, which does not provide open APIs. Media codec will also call Acodec.

Awesomeplayer, Nuplayer, and Acodec will directly call OMX codec, the new HW multimedia interface which seems isolated from Android. It is a pity that, OMX codec is not cross-platform compatible, but it does call the HW platform driver.

For Android on Intel Atom processor-based platforms, we have to stay abreast with Google’s developments to optimize the standard multimedia players. So if you can use the standard Android multimedia player to play video on Intel Atom processor-based platforms, you can achieve good performance because the HW decoder is readily available on this platform.

In the PRC, more than 20 multimedia apps are available in the Android market, as shown below:

Figure 2: online video market in the PRC

Although, Intel encourages ISVs to use the optimized Android multimedia players for best performance on Intel Atom processor-based platforms, but most online video players in the market do not adopt Android standard multimedia players; they prefer to use open source or develop their own codec.

The reasons are as follows:

  1. Android standard multimedia player can only support MP4, 3GP formats, but can’t completely support many other popular formats, such as RM, RMVB, FLV, MKV, DIVX, WMV, etc. These formats have to be decoded by software codecs.
  2. There will be some compatibility problems when using the Android multimedia APIs to parse the streaming packages. ISVs usually claim that Android multimedia’s parsing solution is only a lab solution.
  3. ISVs also claim that Google updates the Android OS often, usually changing the Android multimedia APIs along with it. These changes force the ISVs to change their players and most ISVs are reluctant to do that.
  4. The APIs of standard Android multimedia players are not flexible, so ISVs have a difficult time using them to customize their players.

Given these reasons, ISVs prefer to modify open source such as FFMPEG or develop their own codec to play videos. Because they don’t have much experience in optimizing these open source codecs on Android for Intel® Architecture (IA), ISVs have to adopt software decoder solutions to support all the video formats, which have higher CPU loads resulting in poor performance.

We usually encourage ISVs to use the following tools to help optimize their online video players.

These tools allow ISVs to optimize the open source or their own SW codec on IA-based platforms and obtain good performance. We have optimized Youku, LeTV, and QQ video on Lenovo K900 for better performance. The CPU load of the same video decreased from an average 40% to 8% after optimization[1]. The ISVs and the OEMs are all satisfied with the performance.

Case study for Optimizing H.265/HEVC player on Intel® Atom™ processor-based platforms

Strongene is a Chinese company focusing on kernel video coding technology. It provides advanced H.265/HEVC encoder/decoder codecs, which have been adopted by Xunlei Kankan online video. Their encoder/decoder solution is integrated into FFMPEG open source for ISVs to use.

We used Intel® Vtune™ tools to debug Strongene’s H.265/HEVC decoder. Then we optimized it using the toolsets as explained in the next three subsections. We obtained extreme decoding speed and low CPU occupancy on Intel Atom processor-based platforms.

1. Optimized by YASM & Intel® C++ Compiler (Intel® ICC).

Instead of compiling the optimized ASM assembly codes in open source FFMPEG with the default Android compiler, we use YASM and the Intel® ICC Compiler.

YASM is a complete rewrite of the NASM assembler under the “new” BSD License, which can reuse the SIMD-optimized ASM assembly code for x86 platforms. Developers can download and install the YASM compiler from http://yasm.tortall.net. To use it, modify the configure.sh file to enable the YASM and ASM options before compiling FFMPEG, as shown below:

Figure 3: Modify the FFMPEG configure file

We also encouraged the ISVs to use Intel® ICC tool to compile the native code.

2. Optimized with Intel® Streaming SIMD Extensions (Intel® SSE) instructions:

Debugging with Intel® Vtune tools, we found that Strongene’s codec only used C code to realize YUV2RGB, the performance was not optimal.

Intel Atom processor-based platforms support Intel SSE instruction codes, which includes MMX, MMXEXT, Intel SSE, SSE2, SSE3, SSSE3 and SSE4. Enabling Intel SSE code in open source FFMPEG can highly improve the YUV2RGB performance.

We open the SSSE3 compiler option in the FFMPEG using MMX EXT code as shown in the code snippet below.

Figure 4: Enable SSE code in the FFMPEG

3. Optimized by Intel® Threading Building Blocks (Intel® TBB) tool:

When we ran Intel® Vtune tool, we found that Strongene’s codec created four threads. However, the fastest thread had to wait for the slowest thread, creating idle cores.

Intel®SSE can only work on a single core if used alone. Using Intel® TBB together with Intel® SSE can make the code run on multi-cores, improving performance.

We modified their multi-threads codes to perform multi-tasks, then used Intel ®TBB tool to allocate the task to the idle cores, in order to fully utilize the multi-cores.

Intel TBB can be downloaded from http://threadingbuildingblocks.org/download.

H.265/HEVC Decoder Performance Comparison[2]

By testing, we found that optimization by YASM & Intel® ICC tools can get up to 1.5x performance improvement, Optimization by Intel® SSE can get up to 6x performance improvement compared to C code And optimization by Intel® TBB can get up to 2.6x performance improvement.

We used Intel® Graphics Performance Analyzers (Intel® GPA) tool to test the refresh rate when playing video. Without optimization, when playing the 1080p HEVC video, the average refresh rate on the Lenovo K900 was 11.7 FPS(Frame per Second) but after optimization by the above methods, the average refresh rate on the Lenovo K900 can reach upto 29.6 FPS on the same video.

Figure 5: Performance comparison

When tested with the optimized H.265/HEVC decoder on the codenamed Baytrail tablet, the performance is better than Lenovo K900 phone, the refresh rate can reach 52.6 FPS.

If we set the refresh rate to 24 fps on the codenamed Baytrail tablet, when playing the 1080p video, the CPU workload is less than 35%. So we readily recommend Strongene’s HEVC decoder solution to the popular online video providers in the PRC for commercial use.

Summary

Multimedia apps are very popular on android phone and tablet, their performance is very important for user experience. There are a series of tools to optimize the performance of android apps on Intel® Atom™ processor-based platforms. The H.265/ HEVC decoder can be optimized by these tools to obtain even better performance on Intel® Atom™ processor-based platforms.

Recommendations for ISVs in the new Android world:

Don’t hesitate to optimize your multimedia apps by YASM/ Intel® ICC/ Intel® SSE/ Intel® TBB tools, these trusted tools can provide amazing performance boosts

Related links and resources

To learn more about Intel tools for the Android developer, visit Intel® Developer Zone for Android.

Reference

  1. http://www.strongene.com/en/homepage.jsp
  2. http://yasm.tortall.net
  3. http://threadingbuildingblocks.org

 


[1]Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance

[2]Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance

Categories:
For more complete information about compiler optimizations, see our Optimization Notice.