Improve Encoding Efficiency and Video Quality with Adaptive LTR

Introduction

In media application, one important feature of the encoder is to lower the bitrate while keeping the high video quality. Intel® Media SDK has been working hard on improving this capability based on our hardware.

Media SDK introduces the Adaptive Long Term Reference (LTR) in the 2018R1 Windows release, an intelligent encoding feature which significantly improves compression efficiency and video quality for video conferences, surveillance and certain graphics/game streaming applications. 

We achieved up to 20% improved efficiency by adding long-term reference and increasing I-frame quality for low motion content.

Basics Knowledge

To improve encoding capability, the challenge is to increase compression efficiency while preserving the original video quality. H.264 standard offers motion compensation prediction algorithm to reach this goal. The algorithm applies a prediction model to generate a new frame by shifting the macro-blocks or sub-blocks from the other frames. These frames are called reference frames. The standard defines short-term and long-term reference frame. Long-Term Reference (LTR) frames can be saved and referenced until explicitly removed by the application.

Reference frames encoded with higher quality could improve image quality for the follow up frames. If the video scenario has a stable image scene, the reference frame could be kept longer while decoding more frames, which will help avoid transferring another reference frame in case of the stable scene and hence save the transfer bandwidth. LTR frame has the advantage of being controlled by the encoding process at the application level. This adds the flexibility to improve the encoding efficiency by combining the other tools to do a better decision of reference frame change; according to this, many algorithms with LTR are developed. For example, Cisco has been using LTR as the error correction algorithm for their video conference application for a long time.

This is why Intel® Media SDK chooses LTR as the new feature for its hardware based codec solution. A Long-Term Reference allows, for example, encoding scene background with high quality for better motion compensated prediction in many future frames. Effective use of LTR frames requires detecting such stable content, finding the correct frame for LTR assignment, encoding the LTR frame with high quality and turning off LTR for unsuitable content. This is called Adaptive Long Term Reference.

Before we introduce the details, let's look at how LTR affects the prediction structure. Figure 1 below shows low delay encoding inter frame prediction structures with and without using LTR. The prediction structure with LTR has two advantages at the decoder side, first the decoding algorithm always references to LTR to have a better image quality; secondly the structure with LTR always references to the previous frame and LTR which saves the memory space for caching the extra frame.

Inter Frame Prediction Structure
Figure 1: Low delay encoding inter frame prediction structures using 2 refs (a) without LTR and (b) with LTR

Adaptive LTR

Adaptive LTR is a content analysis based intelligent feature which automatically turns on LTR frame prediction structure based on scene characteristics, automatically decides which frames to assign as LTR, and has an advanced bitrate controller to adaptively assign bits & quality across a scene to maximize LTR prediction efficiency.

Adaptive LTR supports different predict structures in Media SDK, the "IPPP..." structure we showed in Figure 1 is a example, it is configured by Target Usage 4(TU4, the default setting by MSDK). This prediction structure has a simple dependency tree, so it has a low delay for the video streaming.

Blow pictures show visual quality improvement, Figure 2 shows a video conference test sequence; Figure 3 shows a surveillance test sequence.

Image Placeholder
Figure 2: Shown above is Frame 250 of “Vidyo4” sequence encoded at 500kbps using Low delay, CBR & 2 second GOP and AVC balanced preset settings. (a) encoded without LTR  (b) encoded with Adaptive LTR. The center of the video (640x360) has been enlarged to show quality differences.
Image Placeholder
Figure 3: Shown above is Frame 50 of “ShoppingMall” sequence encoded at 1000kbps using Low delay, CBR & 2 second GOP settings with AVC balanced preset.  (a) encoded without LTR (b) encoded with Adaptive LTR. The center of the video (640x360) has been enlarged to show quality differences.

Figure 4 shows Adaptive LTR encoding is 20-24% more coding efficient compare to the no LTR case.

Image Placeholder
Figure 4: Rate distortion curves showing bitrate & PSNR for 4 encodes using low delay CBR 2 second GOP and balanced preset settings for "Vidyo4" & "Shopping Mall" test sequences, encoded with Adaptive LTR and without LTR.

Table 1 shows the BDRate analysis of these sequences.

Table 1: BDRate of Adaptive LTR w.r.t. No LTR encodes.
SequenceResolutionFPSBitrate(kbps)BDRate
Vidyo4_720p_60fps11280x720601500,1000,750,500-24%
ShoppingMall_0221920x1080301500,1000,750,500-20%

 

 

 

 

 

This feature is available for the user defined BRC (ExtBRC) and will turn ON automatically when using the internal rate controller provided in the SDK (implicit) and when encoding parameter allow use of LTR. A new option mfxExtCodingOption3::ExtBrcAdaptiveLTR (on/off) is available in API 1.26 to control adaptive LTR. Section below shows the sample_encode usage and command line parameters for generating the above quality comparison.

Run sample_encode with LTR

  1. Install the Media SDK 2018 R1 for Windows and download the sample code
  2. Open the sample_encode project and make the following changes to the encode_pipeline.cpp
    mfxStatus CEncodingPipeline::InitMfxEncParams(sInputParams *pInParams)
    {
    ......
        // configure the depth of the look ahead BRC if specified in command line
        if (pInParams->nLADepth || pInParams->nMaxSliceSize || pInParams->nMaxFrameSize || pInParams->nBRefType ||
            (pInParams->nExtBRC && (pInParams->CodecId == MFX_CODEC_HEVC || pInParams->CodecId == MFX_CODEC_AVC)) ||
            //pInParams->IntRefType || pInParams->IntRefCycleSize || pInParams->IntRefQPDelta )
            pInParams->IntRefType || pInParams->IntRefCycleSize || pInParams->IntRefQPDelta || m_mfxEncParams.mfx.RateControlMethod != MFX_RATECONTROL_CQP) //Add: more condition 
        {
        ......
            m_CodingOption2.BitrateLimit = MFX_CODINGOPTION_OFF; //Add this line
            m_CodingOption2.IntRefType = pInParams->IntRefType;
            m_CodingOption2.IntRefCycleSize = pInParams->IntRefCycleSize;
            m_CodingOption2.IntRefQPDelta = pInParams->IntRefQPDelta;
            m_EncExtParams.push_back((mfxExtBuffer *)&m_CodingOption2);
        }
    ......
  3. Build the project
  4. Download the test sequence "vidyo4" from Xiph
  5. Install FFmpeg and run the following command to extract YUV stream
    ffmpeg -i ~\Downloads\vidyo4_720p_60fps.y4m vidyo4.yuv
  6. Followings are the commands to run the sample_encode with and without Adaptive LTR feature. The LTR feature is selected by "-extbrc:implicit" option which selects the LTR module internally.
    sample_encode h264 -i vidyo4.yuv -o vidyo4_noltr.h264 -w 1280 -h 720 -hw -b 500 -r 1 -g 120 -f 60
    sample_encode h264 -i vidyo4.yuv -o vidyo4_altr.h264 -w 1280 -h 720 -hw -b 500 -r 1 -g 120 -f 60 -extbrc:implicit
  7. To see the quality improvement clearly, we use the ffmpeg command to combine the two result clips into a side-by-side video clip, and then run the ffplay to view and compare the frames of both result video at the same time. Use the key “p” to pause the playback and use key “s” to move the single frame so the video image can be compared at the any location.
    ffmpeg -i vidyo4_noltr.h264 -i vidyo4_altr.h264 -filter_complex "[0:v:0]pad=(iw*2)+10:ih[bg]; [bg][1:v:0]overlay=w" output.h264
    ffplay -i output.h264

Note: In this test, we use the following parameters for the sample_encode:

  • Target Usage: value 1~7 with 1 to be the best quality; 7 the best speed. The argument is "-u", in the example, this argument was omitted. This means the default value--4 is used, which is Target Usage 4(TU4, the default setting by Media SDK). 
  • Media SDK supports many different predict structures in Media SDK, the "IPPP..." structure we showed in Figure 1 is configured by TU4 and the reference distance 1 with argument "-r 1". the reference distance has to be specified since the default is 3 for TU4.
  • Set GOP size to 120, which is 120/60=2 seconds, this means the key frame interval is 2 seconds.

References

Cisco Design Guide

H.264/AVC Inter Prediction

Hierarchical Prediction Structures

A Long-Term Reference Frame for Hierarchical B-Picture-Based Video Coding

Xiph.org Video Test Media

G. Bjøntegaard, “Calculation of Average PSNR Differences Between RD-curves,” in VCEG-M33 ITU-T Q6/16, Austin, TX, USA, April 2001.

For more complete information about compiler optimizations, see our Optimization Notice.