A Fresh Direction for Future Video Formats

by Mark J Buxton, Intel® Media Development Products Director

Key Points 

  • Intel is a founding member of the Alliance for Open Media.
  • HEVC is the next-gen format used by the media and broadcasting industry and represents Intel’s near-term silicon, software, and tools focus.
  • Open formats are on a trajectory to add a new element to the historical coding efficiency of licensed international standards -additional effort is required to achieve this goal.
  • Come join in the fun.

This past week, Intel announced its membership in the Alliance for Open Media. And with that, we reinforce our commitment to open formats and announce our efforts for delivering the next generation of video coding tools. With founding partners Amazon, Cisco, Google, Intel, Microsoft, Mozilla, and Netflix, the Alliance for Open Media is collaborating to develop the next generation video formats that will reduce the end user cost of video delivery by being optimized for the next generation processors.

Video is critically important to Intel’s business. Our customers are under constant pressure to improve video coding efficiency – whether to overcome the infrastructure challenges of the next 100M smartphones in India, or the difficulty of fitting 4K BT.2020 into legacy cables, we need technologies like HEVC to enable the next generation, and we will need to continue evolution in the generation beyond.

To provide some insight into the recent history of broadcast video formats, let’s consider what broadcast video really means. Over the last 20 years, video broadcast has diversified from a fixed rate channel, to a multiplex of channels, to a packetized video stream (often carried over layered networks). These different models all coexist today. The latter two models were served originally by scalable video and now in some markets by the advantages of encoding in real-time close to the edge of the network (for example to adapt to channel conditions or client device capabilities). Each of these encoding models requires different levels of quality, and different algorithms to balance bitrate and video quality.  

In addition to this “last mile” of broadcast, encoding happens during capture, uplink, authoring, and aggregation. The motive to keep video quality high isn’t a short term problem: display resolutions, brightness and contrast are improving at a phenomenal rate. At the same time, increasing resolutions and bit depths do usually require compression. The move to new formats like HEVC removed network and storage bottlenecks and accelerates both the ability to create, and deliver high-quality broadcast video.

Enter Moore’s law and the virtuous design cycle of microprocessors. The computational complexity (see figure 1) of our video formats is, miraculously, holding roughly steady as we move between generations. This is not by design: HEVC is fearsomely complex compared to AVC, but innovation in algorithmic optimizations is (almost) holding pace. 

Between the last major coding format iteration (AVC->HEVC), the number of processor cores available in a fixed price or power jumped enormously, with the latest Intel® Xeon® processors E5 family featuring up to 18 cores per socket (when AVC launched in 2003, Intel Xeon processors had only 1 core per socket!). The “density” of video coding gets another boost with the advent of the Intel® Xeon® processor E3 family and the emergence of broadcast-quality hardware video encode building blocks. An evolved form of the accelerators and software used in client processors, Intel® Quick Sync Video hardware blocks exposed through Intel® Media Server Studio increase transcoding performance by 3X at higher video quality when comparing Intel® Core™ i7-5850 processors using QSV to the same processor running x264 in software**.   We and our media and broadcast customers now support an increasing range of formats, from legacy MPEG-2 encode for traditional STB’s, to AVC for the previous generation, to VP9 and HEVC for the latest generation of TV’s, tablets, phones, and personal entertainment appliances.



Figure 1. * Intel® Core™ i7-4770 processor software performance and quality comparison for two generations of two families of video codecs.           

As a result, it used to be that encoding was one of the most expensive parts of the ecosystem. But the business models can change when encoding gets vastly improved. Most obviously, our customers can use the reduced cost to become more efficient and conduct more encodes:

  • Video encode becomes possible at the edge.
  • It becomes possible to encode multiple streams at different bitrates and resolutions closer to transmission to improve bandwidth adaptivity.
  • It becomes ultimately possible to optimize encode for each channel and screen.  
  • New low-incremental value uses become possible – like using video in ads in web feeds. 
  • Low encode cost makes it possible to reduce system cost in storage by using the cloud.
  • Low cost sensors benefit from less Wi-Fi bandwidth using HEVC.
  • New types of data become possible to compress, like depth maps.   

That's a lot of encodes, and in many of these models, there are a lot of wasted encodes or bitstreams that never get consumed.

So what's the problem?

Our customers are constrained by the tariffs requested by third parties: many of these business models that exploit the commoditization of encode have a tough time dealing with tariffs on the datatype itself, or in the channel between encoders and decoders. The reason is simple:  they have a hard time estimating the number and amount of users of a given codec (as the types can be determined by the channel, or a user), a hard time knowing how many encodes were used (due to leveraging third party services, or accepting blind data from providers), and a difficult time knowing how to translate multiple different license structures into their business model.

Our broadcast customers do not appear to be adopting HEVC at the same velocity we saw with the previous generation (AVC).  This is a result of uncertainty caused by the tariffs..  As Intel’s business is innovation and bringing high quality video to everyone on the planet, we see anything that slows HEVC adoption as a challenge.  The most obvious solution for us is to attempt to fix this problem– to ensure that the next generation video format is

  • Interoperable and open;
  • Optimized for the web;
  • Scalable to any modern device at any bandwidth;
  • Designed with a low computational footprint and optimized for hardware;
  • Capable of consistent, highest-quality, real-time video delivery; and
  • Flexible for both commercial and non-commercial content, including user-generated content.

Video Coding Formats: an Evolution

The best video coding efficiency format today is HEVC (see Figure 1*). There are many ways to measure video coding efficiency. The “ BD-RATE method on the vertical axis of Figure-1 is one used widely– it reduces the bitrate and video quality into a single metric (as neither is truly independent) by comparing the curves generated by a number of quality and bitrates to a golden format (the format used here is WG11’s HM14 reference encoder). The quality metric used for this comparison is Y-PSNR. Y-PSNR used to be an adequate video metric, but it is less and less useful on the latest generation of video coding formats. Nevertheless, it is a very good format. It is possible to construct good video quality that yield close to the ‘objective’ results, once quality issues with large block size are accommodated. It was developed in an open process, with contributions from many nations, each composed of academic, government, and private institutions – hundreds of brilliant technologists, among them notably few lawyers. 

An alternative model exists in WebM. VP8, the first WebM codec, started as a proprietary technology. It was purchased, opened and adapted quickly to streaming uses by Google. Google has since provided the industry with free licenses to use, free open source software, and even free hardware IP. VP8 wasn’t (and isn’t) competitive in video coding efficiency with AVC or HEVC for broadcast, but it was deployed with few license restrictions by many customers, and is especially popular in video conferencing, for which it is well-suited.  

Likewise VP9 was recently developed as a successor to VP8, with a similar (free) license model. VP9, like HEVC, is a good and modern video codec. Instead consider the stills from Pictures 1, 2, and 3 below.  I wanted to illustrate some of the pitfalls of using old quality metrics, so I offer up one of the harder sequences for HEVC:  crowd_run.  This is a ‘hard’ sequence because it offers a mix of motion types, a tremendous amount of information, and textures that challenge large blocks.  While overall HEVC can yield better quality than VP9 when averaged across a large amount of content (see Figure 1), that isn’t the case on this sequence. For reasonably highly-impaired content, you can <see> the visual advantages from VP9.

Like HEVC, VP9 offers support for expanded bit depth, expanded color representations, large resolutions, and a range of applications. VP9 is much closer to HEVC in quality than VP8 was to AVC, and I will hypothesize (due to the relative youth of VP9) that this gap will close a bit further


Picture 1. * VP9 coded at 8.5mbps using presets “–good –cpu-used=0” .  Zoomed in region from Crowd-run.  Note detail in the trees.  Very good results for a very tough sequence (unfortunately, at this quality it runs two orders of magnitude slower than the others.  

 


Picture 2. *  AVC coded at 12. mbps using preset “–veryslow”. Zoomed in region from standar test sequence “crowd-run”. Note much lower detail in the trees and ringing. Despite, this, objectively, PSNR on the AVC version is 2 dB higher (!).  

 


Picture 3. * HEVC coded at 7.6. mbps using preset “–TU4”.  Zoomed in region from Crowd-run. Fewer egregious coding artifacts than x264 at much lower bitrate, but not as good as VP9. (Objectively, this sequence is 10% lower bitrate than the VP9 version at the same Y-PSNR). Interestingly, the software version runs at twice the speed of the AVC version used above.

 

Google was not alone.  Others looking for royalty-free video created new video coding formats. Probably the most notable of these are Xiph/Mozilla Daala, Cisco’s Thor, and China’s AVS (v1 and v2). 

Both models are able to create video coding formats of equivalent technical quality. Why then did we join the Alliance?  

It's because we believe, as do the other founders, that the successor to HEVC and VP9 needs to be not only a large leap forward in video coding efficiency, but new Alliance is committing its collective technology and expertise to meet growing internet demand for top-quality video, audio, imagery and streaming across devices of all kinds and for users worldwide.  The Alliance gives us the opportunity to converge Thor, Daala, and VP10 into a single consistent next-generation video format, thereby creating opportunities for next-generation media experiences.

What and When?

In case you’re hoping for a new video codec by Christmas – it will take a bit longer. The Alliance is going to move fast, but the current generation of video formats is a good upgrade and we have a big investment in hardware, software, and tools to make HEVC (and VP9) successful. It takes time to develop a new video format that is a true “leap ahead” of HEVC (so don’t wait for the Alliance’s output to upgrade from AVC <<to garner the benefits of HEVC>>…).    

We are confident, however, that by working in this direction together, we can create an open source project that will develop next-generation media formats, codecs and technologies in the public interest.

Come Join in the fun!

We’ve been asked by interested parties how they can help. Here are a few things I’d like to see from the broader community … even if you don’t decide to get directly involved with the Alliance. 

  • Intel is looking for original, uncompressed, and free broadcast quality video content for the effort – both for technical uses by the compression team and for showcasing this technology.
     
  • Intel encourage innovation in new technology for video quality analysis that will include challenges of next-generation coding tools. This includes specific attention to ringing, flicker, motion-sensitive artifacts that can be exacerbated by large blocks, very high brightness and contrast displays. Intel’s Video Quality Caliper (part of the Intel® Video Pro Analyzer) provides a plugin to help you innovate in this space.
     
  • Most important, Intel needs your support. If you favor open and royalty-free video formats, we’d like to know.  Feel free to contact us (below).

Contacts


*Benchmarking Details

The process we used for measuring HEVC performance and video quality is described in the whitepaper at https://software.intel.com/sites/default/files/managed/d7/07/Intel_HEVCWhitepaper_v1%2050_R6_24Jun2015.pdf

We used the excellent third party tools from WebM and x264 to represent these codecs:

Webm:  https://chromium.googlesource.com/webm/libvpx/+/v1.4.0
X264: http://download.videolan.org/pub/videolan/x264/binaries/win64/x264-r2597-e86f3a1.exe

We used CQP modes to try to avoid the variation introduced by different types of bitrate control, which are often customized by applications.  This isn’t entirely neutral from a coding tools perspective as the underlying mini-GOP structures are still significantly different in these three implementations.

For x264: park_joy, ducks_take_off use QP’s [28, 31, 35, 37, 40, 45]. Crowd_run use QP’s[26, 30, 34, 38, 40, 45].  bq_terrace uses QP;s[25,27,31,34].  Park_scene and touchdown_pass use QP’s[23,26,29,32].  Vp9 and vp9 use QP’s [40,50,55,60,63] for all sequences.  For HEVC see the whitepaper link above.

VP8 usage:  vpxenc -p 1 --good --cpu-used=16 --end-usage=q --cq-level=40 --max-q=40 --tune=psnr --verbose --psnr -w 1920 -h 1080 --fps=50/1 --limit=500 --codec=vp8

VP9 usage: vpxenc -p 1 --good --cpu-used=8 --end-usage=q --cq-level=40 --tune=psnr --verbose --psnr -w 1920 -h 1080 --fps=50/1 --limit=500 --codec=vp9

x264 usage: x264-r2597-e86f3a1.exe --qp 26 --preset veryslow --tune psnr --keyint 100000  --input-res 1920x1080

Baseline configuration: Intel® Media Server Studio 2015 Professional R7 running on Microsoft* Windows* 8.1.  Intel Customer Reference Platform with Intel® Core i7-4770 processor (84W, 4C,3.5GHz, Intel® HD Graphics 4600).   Intel Z87KL sesktop board with Intel Z87LPC, 16 GB (4x4GB DDR3-1600MHz UDIMM), 1.0TB 7200 SATA HDD, Turbo Boost Enabled, and HT Enabled.  Source: Intel internal measurements as of September 2015.

** To compare the benefits of Intel Quick-sync Video vs. Software, we use the following baseline configuration: Intel Shark Bay V2 Customer Reference Platform with Intel® Core i7-5850 processor  (43W, 4C, 2.7Ghz, Intel® Iris Pro Graphics P6200), 16 GB (4x4GB DDR3-1600MHz UDIMM), 1.0TB 7200 SATA HDD, Turbo Boost Enabled, HT Enabled, Microsoft* Windows* 8.1 64-bit, Intel® Media Server Studio 2015 R6 Essentials Edition.  x264 using –very fast presets and QSV using –TU4. Source: Intel internal measurements as of August 2015

Intel, the Intel logo, Core and Xeon are trademarks of Intel Corporation in the U.S. or other countries.
* Other names and brands may be claimed as the property of others.
© Intel Corporation 2015

For more complete information about compiler optimizations, see our Optimization Notice.