By Narifumi Iwamoto, Min zhi Sun, Ragha Khandenahally, and Sonal Sharma
Published:12/04/2019 Last Updated:12/04/2019
Achieve efficient, flexible, and faster media creation and distribution with high quality and high performance hardware encoder.
Video compression technologies play a key role in creation and distribution of high-quality video content. Cloud-based video distribution and video analytics workloads are growing exponentially. This paper provides the status quo of video encode quality of a key revision to Intel Codec IP available in 10th generation Intel® Core™ processors. The analysis includes video encode quality of HEVC1 codecs using objective evaluation methodology.
Video compression is a highly complex process defined by international standards. Software-based encoding takes a significant amount of time or consumes a lot of power, which has big impact on battery life in laptop or mobile use cases.
On the other hand, hardware-based encoding tends to have different challenges in terms of quality and configuration flexibility, as compared to software-based solutions. Intel has been a leader in striving for these difficulties for a decade starting with the 1st generation Intel Core processor, and continuously improving encoding quality, performance, and configuration flexibility. In the 10th generation Intel Core processor, video quality of HEVC hardware encoding is dramatically improved by implementing new logic. This paper addresses HEVC hardware encoding quality evaluation delivered by 10th generation Intel Core processors by using industry standard methodology.
In order to conduct video quality assessment fairly, common conditions and software configurations need to be defined across encoders. We used standardized methodology that has been used by video coding standardization groups such as JCTVC.2 Based on this methodology, we defined the following configurations and conditions and used HM Test Model3 as an anchor encoder software.
We used three encoders for this quality assessment as shown in the following table.
Table 1: Target encoders
Target encoders | Components | Availability |
---|---|---|
HEVC Test Model 14.0 (HM14) | 14.0 | https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/archived/HM-14.0-dev/ |
10th generation Intel® Core™ processor and Intel® Media SDK sample encode | Intel® Core™ i7-1065G7 processor | |
Intel® Iris® Plus graphics driver 26.20.100.7372 |
https://downloadcenter.intel.com/ | |
Encoding sample version 8.4.27.31 | https://github.com/Intel-Media-SDK/MediaSDK | |
FFMPEG with libX265 library enabled | 4.1.3 static | https://ffmpeg.zeranoe.com/builds/win64/static/ffmpeg-4.1.3-win64-static.zip (2019-Apr-26 16:10) |
Two configurations are selected for this quality assessment to cover various encoding use cases. One is Low Delay (LD), which has forward reference only to cover low latency use cases such as video conferencing. In this configuration, frame re-ordering is not required at the time of decoding so that the encoder and decoder frame order matches. The other configuration is Random Access (RA), which includes both forward and backward references for better quality. Frame re-ordering is required at the deciding stage in this configuration.
Transformed compressed video signals are quantized by using the Quantization Parameter (Qp). Smaller Qp value indicates larger frame size; larger Qp value indicates smaller frame size.
We defined 10 different Qp values for all the intra frames of this quality assessment to cover a wide range of bitrates. To avoid quality impact by rate controller implementations, we used constant Qp for each frame, as with standardized methodology.
Table 2: Qp value and intra frame period definition.
Target | I Frame Qp Value | Intra Frame Period |
---|---|---|
High Bitrate | { 18 / 21 / 23 } | Low Delay: 60 frame Random Access: 64 frame4 |
Medium Bitrate | { 25 / 27 / 29 /32 } | |
Low Bitrate | { 32 / 37 /42 / 47 } |
Qp offset for P frame and B frame including GOP (group of pictures) structure are defined by each encoder based on its capability. Because configuration capability of GOP structure should be included for this quality assessment, we defined I frame Qp value and period as common conditions across target encoders.
Input video sequences are defined in Table 3. All the sequences are available to the public at https://media.xiph.org/video/derf/.
Table 3: Video sequences and configuration definition.
Sequence Name | Resolution | Frame Count | Frame Rate | Bit Depth | Configuration | |
---|---|---|---|---|---|---|
1 | ParkJoy | 1920 x 1080 | 400 | 50 | 8 | Main Profile Random Access |
2 | RushFieldCuts | 1920 x 1080 | 400 | 30 | 8 | Main Profile Random Access |
3 | Counter-Strike: Global Offensive* | 1920 x 1080 | 400 | 60 | 8 | Main Profile Random Access |
4 | Minecraft* | 1920 x 1080 | 400 | 60 | 8 | Main Profile Low Delay |
5 | CrowdRun 4K | 3840 x 2160 | 400 | 50 | 8 | Main Profile Random Access |
6 | CrowdRun 4K | 3840 x 2160 | 400 | 50 | 8 | Main Profile Low Delay |
GOP structure is defined by each encoder based on its capability while locking I frame Qp value and its period, as defined in Table 2, for fair comparison. Following is the tested execution command line and configuration for each encoder.
HM14.0 Tested command line:
>TappEncoderStatic -c {encoder_lowdelay_main.cfg|encoder_randomacess_main.cfg} -f NUM_OF_FRAMES -fs 0 -fr FRAME_RATE --InputBitDepth=8|10 --OutputBitDepth=8|10 -wdt VIDEO_WIDTH -hgt VIDEO_HEIGHT -i INPUT_FILE_NAME -q QP_VALUE -b OUT_FILE_NAME
HM14.0 Random Access configuration snippet. B-pyramid structure is used for higher quality:
#======== Coding Structure ============= IntraPeriod : 64 # Period of I-Frame ( -1 = only first) DecodingRefreshType : 0 # Random Accesss 0:none, 1:CRA, 2:IDR, 3:Recovery Point SEI GOPSize : 8 # GOP Size (number of B slice = GOPSize-1) Frame1: B 8 1 0.442 0 0 0 4 4 -8 -10 -12 -16 0 Frame2: B 4 2 0.3536 0 0 0 2 3 -4 -6 4 1 4 5 1 1 0 0 1 Frame3: B 2 3 0.3536 0 0 0 2 4 -2 -4 2 6 1 2 4 1 1 1 1 Frame4: B 1 4 0.68 0 0 1 2 4 -1 1 3 7 1 1 5 1 0 1 1 1 Frame5: B 3 4 0.68 0 0 1 2 4 -1 -3 1 5 1 -2 5 1 1 1 1 0 Frame6: B 6 3 0.3536 0 0 0 2 4 -2 -4 -6 2 1 -3 5 1 1 1 1 0 Frame7: B 5 4 0.68 0 0 1 2 4 -1 -5 1 3 1 1 5 1 0 1 1 1 Frame8: B 7 4 0.68 0 0 1 2 4 -1 -3 -7 1 1 -2 5 1 1 1 1 0
HM14.0 Low Delay configuration snippet:
#======== Coding Structure ============= IntraPeriod : 60 # Period of I-Frame ( -1 = only first) DecodingRefreshType : 0 # Random Accesss 0:none, 1:CRA, 2:IDR, 3:Recovery Point SEI GOPSize : 4 # GOP Size (number of B slice = GOPSize-1) Frame1: B 1 3 0.4624 0 0 0 4 4 -1 -5 -9 -13 0 Frame2: B 2 2 0.4624 0 0 0 4 4 -1 -2 -6 -10 1 -1 5 1 1 1 0 1 Frame3: B 3 3 0.4624 0 0 0 4 4 -1 -3 -7 -11 1 -1 5 0 1 1 1 1 Frame4: B 4 1 0.578 0 0 0 4 4 -1 -4 -8 -12 1 -1 5 0 1 1 1 1
Intel Media SDK supports B-pyramid GOP structure for random access configuration for higher quality. The illustration of B-pyramid GOP structure like HM14.0 is shown in Figure 1. In B-pyramid configuration, each B frame inside a mini GOP has a Qp offset, and reference structure is carefully designed to achieve the highest quality while minimizing video stream size.
Figure 1: B-pyramid GOP structure supported by Intel® Media SDK.
Following is the command line to enable this feature followed by Low Delay configuration.
Command line for Random Access configuration:
>sample_encode.exe h265 -i INPUT_FILE_NAME -w VIDEO_WIDTH -h VIDEO_HEIGHT -u veryslow|medium|veryfast -f FRAME_RATE -o OUT_FILE_NAME -hw -async 3 -g 64 -n NUM_OF_FRAMES -cqp -qpi 18|21|23|25|27|29|32|37|37|42|47 -r 8 -bref
Command line for Low Delay configuration:
>sample_encode.exe h265 -i INPUT_FILE_NAME -w VIDEO_WIDTH -h VIDEO_HEIGHT -u veryslow|medium|veryfast -f FRAME_RATE -o OUT_FILE_NAME -hw -async 3 -g 60 -n NUM_OF_FRAMES -cqp -qpi 18|21|23|25|27|29|32|37|37|42|47 -preset conference
Following is the x265 software encoder configuration and command line used by this quality evaluation.
Command line for Low Delay configuration (8/10 bit):
>ffmpeg.exe -y -s VIDEO_WIDTHxVIDEO_HEIGHT -pix_fmt yuv420p|yuv420p10le -r FRAME_RATE -i INPUT_FILE_NAME -frames:v NUM_OF_FRAMES -g 60 -bsf:v hevc_mp4toannexb -c:v libx265 -preset veryslow|medium|veryfast -profile main|main10 -x265-params "qp=QP_VALUE:aq-mode=0:b-adapt=0:bframes=0:b-pyramid=1:tune=psnr:no-scenecut=1:no-open-gop=1:input-depth=8|10:output-depth=8|10\" OUT_FILE_NAME
Command line for Random Access configuration (8/10 bit):
>ffmpeg.exe -y -s VIDEO_WIDTHxVIDEO_HEIGHT -pix_fmt yuv420p|yuv420p10le -r FRAME_RATE -i INPUT_FILE_NAME -frames:v NUM_OF_FRAMES -g 64 -bf 7 -bsf:v hevc_mp4toannexb -c:v libx265 -preset veryslow|medium|veryfast -profile main|main10 -x265-params "qp=QP_VALUE:aq-mode=0:b-adapt=0:bframes=7:b-pyramid=1:tune=psnr:no-scenecut=1:no-open-gop=1:input-depth=8|10:output-depth=8|10\" OUT_FILE_NAME
This section describes the common methodology used for quality evaluation for each video encoder. Bitrate, measured in bits per second (bps), is calculated by the following formula from each encoded video stream:
Bitrate(bps) : Encoded video stream file size in byte * 8 / ( total frames count / frame rate )
Also, average peak signal-to-noise ratio (PSNR) across video sequences is calculated against original uncompressed video by the following command:
> ffmpeg.exe -framerate %d -r FRAME_RATE -i INPUT_FILE_NAME_1 -vcodec rawvideo -pix_fmt yuv420p|yuv420p10le -s:v VIDEO_WIDTHxVIDEO_HEIGHT -r FRAME_RATE -i INPUT_FILE_NAME_2 -frames:v NUM_OF_FRAMES -lavfi psnr|ssim -f null
We plot all the bitrate and average PSNR values from each output stream on the graph to create the Rate Distortion Curve (RD-Curve) to compare quality between target encoders. In order to quantify compression gains, we also used the Bjontegaard Delta Bitrate (BD-Rate)5. This metric allows computing of the average distance between any two RD curves. The metric shows how much bitrate savings an encoder achieved with respect to a base encoder of the same quality. We used the HM14.0 encoder as a base quality; a lower bar indicates better quality.
The new 10th generation Intel Core processor hardware HEVC encoder delivers a superior quality HEVC video stream. It is very close to the HMTest model, which is theoretically the best software HEVC encoder; only 22.9 percent to 32.6 percent bit differences at the same quality level across the six contents and configurations defined here. Also, each configuration shows reasonably smooth differences, as we expect.
Be ready to take advantage of this new feature and technology in your business.
Table 4: BD-Rate summary
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804