Evolution of Hardware HEVC Encode on 10th Generation Intel® Core™ Processors

By Narifumi Iwamoto, Min zhi Sun, Ragha Khandenahally, and Sonal Sharma

Published:12/04/2019   Last Updated:12/04/2019

Achieve efficient, flexible, and faster media creation and distribution with high quality and high performance hardware encoder.

Executive Summary

Video compression technologies play a key role in creation and distribution of high-quality video content. Cloud-based video distribution and video analytics workloads are growing exponentially. This paper provides the status quo of video encode quality of a key revision to Intel Codec IP available in 10th generation Intel® Core™ processors. The analysis includes video encode quality of HEVC1 codecs using objective evaluation methodology.

Challenges for Video Encoding

Video compression is a highly complex process defined by international standards. Software-based encoding takes a significant amount of time or consumes a lot of power, which has big impact on battery life in laptop or mobile use cases.

On the other hand, hardware-based encoding tends to have different challenges in terms of quality and configuration flexibility, as compared to software-based solutions. Intel has been a leader in striving for these difficulties for a decade starting with the 1st generation Intel Core processor, and continuously improving encoding quality, performance, and configuration flexibility. In the 10th generation Intel Core processor, video quality of HEVC hardware encoding is dramatically improved by implementing new logic. This paper addresses HEVC hardware encoding quality evaluation delivered by 10th generation Intel Core processors by using industry standard methodology.

Quality Evaluation Methodology

Approach

In order to conduct video quality assessment fairly, common conditions and software configurations need to be defined across encoders. We used standardized methodology that has been used by video coding standardization groups such as JCTVC.2 Based on this methodology, we defined the following configurations and conditions and used HM Test Model3 as an anchor encoder software.

Target Encoders

We used three encoders for this quality assessment as shown in the following table.

Table 1: Target encoders

Target encoders Components Availability
HEVC Test Model 14.0 (HM14) 14.0 https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/archived/HM-14.0-dev/
10th generation Intel® Core™ processor and Intel® Media SDK sample encode Intel® Core™ i7-1065G7 processor  
Intel® Iris® Plus graphics driver
26.20.100.7372
https://downloadcenter.intel.com/
Encoding sample version 8.4.27.31 https://github.com/Intel-Media-SDK/MediaSDK
FFMPEG with libX265 library enabled 4.1.3 static https://ffmpeg.zeranoe.com/builds/win64/static/ffmpeg-4.1.3-win64-static.zip (2019-Apr-26 16:10)

Configurations—Low Delay and Random Access

Two configurations are selected for this quality assessment to cover various encoding use cases. One is Low Delay (LD), which has forward reference only to cover low latency use cases such as video conferencing. In this configuration, frame re-ordering is not required at the time of decoding so that the encoder and decoder frame order matches. The other configuration is Random Access (RA), which includes both forward and backward references for better quality. Frame re-ordering is required at the deciding stage in this configuration.

Quantization Parameter

Transformed compressed video signals are quantized by using the Quantization Parameter (Qp). Smaller Qp value indicates larger frame size; larger Qp value indicates smaller frame size.

Flow of quatization process

We defined 10 different Qp values for all the intra frames of this quality assessment to cover a wide range of bitrates. To avoid quality impact by rate controller implementations, we used constant Qp for each frame, as with standardized methodology.

Table 2: Qp value and intra frame period definition.

Target I Frame Qp Value Intra Frame Period
High Bitrate { 18 / 21 / 23 } Low Delay: 60 frame
Random Access: 64 frame4
Medium Bitrate { 25 / 27 / 29 /32 }
Low Bitrate { 32 / 37 /42 / 47 }

Qp offset for P frame and B frame including GOP (group of pictures) structure are defined by each encoder based on its capability. Because configuration capability of GOP structure should be included for this quality assessment, we defined I frame Qp value and period as common conditions across target encoders.

Video Sequences and Configurations

Input video sequences are defined in Table 3. All the sequences are available to the public at https://media.xiph.org/video/derf/.

Table 3: Video sequences and configuration definition.

  Sequence Name Resolution Frame Count Frame Rate Bit Depth Configuration
1 ParkJoy 1920 x 1080 400 50 8 Main Profile
Random Access
2 RushFieldCuts 1920 x 1080 400 30 8 Main Profile
Random Access
3 Counter-Strike: Global Offensive* 1920 x 1080 400 60 8 Main Profile
Random Access
4 Minecraft* 1920 x 1080 400 60 8 Main Profile
Low Delay
5 CrowdRun 4K 3840 x 2160 400 50 8 Main Profile
Random Access
6 CrowdRun 4K 3840 x 2160 400 50 8 Main Profile
Low Delay

Tested Command Line and Configurations

GOP structure is defined by each encoder based on its capability while locking I frame Qp value and its period, as defined in Table 2, for fair comparison. Following is the tested execution command line and configuration for each encoder.

HM14.0 Tested command line:

>TappEncoderStatic -c {encoder_lowdelay_main.cfg|encoder_randomacess_main.cfg} -f NUM_OF_FRAMES -fs 0 -fr FRAME_RATE --InputBitDepth=8|10 --OutputBitDepth=8|10 -wdt VIDEO_WIDTH -hgt VIDEO_HEIGHT -i INPUT_FILE_NAME -q QP_VALUE -b OUT_FILE_NAME

HM14.0 Random Access configuration snippet. B-pyramid structure is used for higher quality:

#======== Coding Structure =============
IntraPeriod                   : 64          # Period of I-Frame ( -1 = only first)
DecodingRefreshType           : 0           # Random Accesss 0:none, 1:CRA, 2:IDR, 3:Recovery Point SEI
GOPSize                       : 8           # GOP Size (number of B slice = GOPSize-1)
Frame1:  B    8   1    0.442    0     0     0     4     4     -8 -10 -12 -16   0 
Frame2:  B    4   2    0.3536   0     0     0     2     3     -4 -6  4         1    4    5    1 1 0 0 1
Frame3:  B    2   3    0.3536   0     0     0     2     4     -2 -4  2 6       1    2    4    1 1 1 1  
Frame4:  B    1   4    0.68     0     0     1     2     4     -1  1  3 7       1    1    5    1 0 1 1 1 
Frame5:  B    3   4    0.68     0     0     1     2     4     -1 -3  1 5       1   -2    5    1 1 1 1 0
Frame6:  B    6   3    0.3536   0     0     0     2     4     -2 -4 -6 2       1   -3    5    1 1 1 1 0
Frame7:  B    5   4    0.68     0     0     1     2     4     -1 -5  1 3       1    1    5    1 0 1 1 1  
Frame8:  B    7   4    0.68     0     0     1     2     4     -1 -3 -7 1       1   -2    5    1 1 1 1 0

HM14.0 Low Delay configuration snippet:

#======== Coding Structure =============
IntraPeriod                   : 60          # Period of I-Frame ( -1 = only first)
DecodingRefreshType           : 0           # Random Accesss 0:none, 1:CRA, 2:IDR, 3:Recovery Point SEI
GOPSize                       : 4           # GOP Size (number of B slice = GOPSize-1)
Frame1:  B    1   3     0.4624   0     0      0     4     4    -1 -5 -9 -13     0
Frame2:  B    2   2     0.4624   0     0      0     4     4    -1 -2 -6 -10     1  -1  5  1 1 1 0 1
Frame3:  B    3   3     0.4624   0     0      0     4     4    -1 -3 -7 -11     1  -1  5  0 1 1 1 1            
Frame4:  B    4   1     0.578    0     0      0     4     4    -1 -4 -8 -12     1  -1  5  0 1 1 1 1

10th Generation Intel® Core™ processor and Intel® Media SDK Command Line and Configuration

Intel Media SDK supports B-pyramid GOP structure for random access configuration for higher quality. The illustration of B-pyramid GOP structure like HM14.0 is shown in Figure 1. In B-pyramid configuration, each B frame inside a mini GOP has a Qp offset, and reference structure is carefully designed to achieve the highest quality while minimizing video stream size.

Figure 1: B-pyramid GOP structure supported by Intel® Media SDK.

B pyramid G O P structure supported by intel media S D K

Following is the command line to enable this feature followed by Low Delay configuration.

Command line for Random Access configuration:

>sample_encode.exe h265 -i INPUT_FILE_NAME -w VIDEO_WIDTH -h VIDEO_HEIGHT -u veryslow|medium|veryfast -f FRAME_RATE -o OUT_FILE_NAME -hw -async 3 -g 64 -n NUM_OF_FRAMES -cqp -qpi 18|21|23|25|27|29|32|37|37|42|47 -r 8 -bref

Command line for Low Delay configuration:

>sample_encode.exe h265 -i INPUT_FILE_NAME -w VIDEO_WIDTH -h VIDEO_HEIGHT -u veryslow|medium|veryfast -f FRAME_RATE -o OUT_FILE_NAME -hw -async 3 -g 60 -n NUM_OF_FRAMES -cqp -qpi 18|21|23|25|27|29|32|37|37|42|47 -preset conference

FFMPEG and libx265 Command Line Configuration

Following is the x265 software encoder configuration and command line used by this quality evaluation.

Command line for Low Delay configuration (8/10 bit):

>ffmpeg.exe -y -s VIDEO_WIDTHxVIDEO_HEIGHT -pix_fmt yuv420p|yuv420p10le -r FRAME_RATE -i INPUT_FILE_NAME -frames:v NUM_OF_FRAMES -g 60 -bsf:v hevc_mp4toannexb -c:v libx265 -preset veryslow|medium|veryfast -profile main|main10 -x265-params "qp=QP_VALUE:aq-mode=0:b-adapt=0:bframes=0:b-pyramid=1:tune=psnr:no-scenecut=1:no-open-gop=1:input-depth=8|10:output-depth=8|10\" OUT_FILE_NAME

Command line for Random Access configuration (8/10 bit):

>ffmpeg.exe -y -s VIDEO_WIDTHxVIDEO_HEIGHT -pix_fmt yuv420p|yuv420p10le -r FRAME_RATE -i INPUT_FILE_NAME -frames:v NUM_OF_FRAMES -g 64 -bf 7 -bsf:v hevc_mp4toannexb -c:v libx265 -preset veryslow|medium|veryfast -profile main|main10 -x265-params "qp=QP_VALUE:aq-mode=0:b-adapt=0:bframes=7:b-pyramid=1:tune=psnr:no-scenecut=1:no-open-gop=1:input-depth=8|10:output-depth=8|10\" OUT_FILE_NAME

Quality Evaluation Result

This section describes the common methodology used for quality evaluation for each video encoder. Bitrate, measured in bits per second (bps), is calculated by the following formula from each encoded video stream:

Bitrate(bps) : Encoded video stream file size in byte * 8 / ( total frames count / frame rate )

Also, average peak signal-to-noise ratio (PSNR) across video sequences is calculated against original uncompressed video by the following command:

> ffmpeg.exe -framerate %d -r FRAME_RATE -i INPUT_FILE_NAME_1 -vcodec rawvideo -pix_fmt yuv420p|yuv420p10le -s:v VIDEO_WIDTHxVIDEO_HEIGHT -r FRAME_RATE -i INPUT_FILE_NAME_2 -frames:v NUM_OF_FRAMES -lavfi psnr|ssim -f null

We plot all the bitrate and average PSNR values from each output stream on the graph to create the Rate Distortion Curve (RD-Curve) to compare quality between target encoders. In order to quantify compression gains, we also used the Bjontegaard Delta Bitrate (BD-Rate)5. This metric allows computing of the average distance between any two RD curves. The metric shows how much bitrate savings an encoder achieved with respect to a base encoder of the same quality. We used the HM14.0 encoder as a base quality; a lower bar indicates better quality.

ParkJoy 1080p - Random Access

Chart comparing ParkJoy 1080p random access R D curve

RushFieldCuts 1080p - Random Access

Chart comparing RushFieldCuts 1080p random access R D curve

Counter-Strike: Global Offensive 1080p – Random Access

Chart comparing counter-strike global offensive 1080p random access R D curve

Minecraft 1080p - Low Delay

Chart comparing minecraft 1080p low delay

CrowdRun 3840x2160 - Random Access

Chart comparing CrowdRun 3840 x 2160 random access R D curve

CrowdRun 3840x2160 - Low Delay

Chart comparing CrowdRun 3840 x 2160 low delay R D curve

Quality Evaluation Summary

The new 10th generation Intel Core processor hardware HEVC encoder delivers a superior quality HEVC video stream. It is very close to the HMTest model, which is theoretically the best software HEVC encoder; only 22.9 percent to 32.6 percent bit differences at the same quality level across the six contents and configurations defined here. Also, each configuration shows reasonably smooth differences, as we expect.

Be ready to take advantage of this new feature and technology in your business.

Table 4: BD-Rate summary

Chart comparing B D rate average bit differences against HMTest model

Footnotes

  1. G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, "Overview of the High Efficiency Video Coding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video Technology, December 2012.
  2. F. Bossen, “Common HM Test Conditions and Software Reference Configuration,” Joint Collaborative Team on Video Coding, document JCTVC-L1100, Geneva, Switzerland, Jan. 2013.
  3. “High Efficiency Video Coding Test Model Software Version 16,” Available at https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware, Last accessed on October 1, 2017.
  4. In 8 frames B pyramid GOP configuration, some of encoders require multiple of 8 for Intra period to encode, we defined 64 frames for Intra period.
  5. G. Bjøntegaard, Calculation of average PSNR differences between RDcurves, Technical Report VCEG-M33, ITU-T SG16/Q6, Austin, Texas, USA, 2001.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804