Video Encoding Using Integrated Intel® HD Graphics

In this article we’d like to write about H.264 video processing on Intel GPUs on Linux and the experience our company, Inventos, got in the process of enhancing  StreamBuilder — our streaming media server.

 

Introduction

When Intel(R) Media Server Studio was released for Linux we were very keen on implementing Intel(R) Quick Sync Video technology into StreamBuilder — our versatile media server software that works as a backend for Webcaster.pro. At that moment StreamBuilder was able to:

  • capture input streams from SDI, IP multicasts, RTMP

  • transcode and resample virtually any audio/video streams to H264 HLS or RTMP

  • support distributed and fault-tolerant deployment scheme where ingesting, encoding and streaming are performed on independent and redundant nodes

  • apply filters (for audio normalization \ amplification, video deinterlace, crop, resize, etc.)

  • have flexible configuration (with own DSL) which allows to build pipelines (and even trees) for consequent media processing using mentioned filters etc.

 

StreamBuilder is based on libavcodec and despite that it’s already optimized well enough, it was designed to work on x86 CPUs. Increasing number of CPU cores speeds up encoding almost linearly, but it’s expensive and there are always tasks to do on CPU besides video encoding. Using GPU for encoding could make processing faster, cheaper and with a higher channel/rack unit density value.

 

Intel solution

So it was set: rewrite a major part of StreamBuilder core, implement Intel(R) Media Server Studio SDK to get a significant performance boost. Our goal was to encode at least 4 Full HD streams on a “budget price” hardware. Slightly anticipating, we’d say that the goal was outperformed.

 

Colleagues from Intel were interested in a “reallife” use case of Media Server Studio SDK too. They did a great job, helping us during the development and implementation process, answering our questions, providing code samples and making valuable pieces of advice.

 

Media Server Studio SDK comes with documentation and examples, which cover almost all possible use cases. It helped us a lot and simplified implementation greatly. As a matter of fact, implementation in our case came down to replacing decoding/ encoding/resampling modules to Intel Quick Sync Video-enabled modules that use Intel(R) HD Graphics abilities.

 

Staging hardware and software

We used 1RU (Rack Unit) server with following specs:

Motherboard

Supermicro X10SLH-F

CPUs

#1 Intel® Xeon® CPU E3-1225 v3, Intel® HD Graphics 3000

#2 Intel® Core™ i7-3770, Intel® HD Graphics 4000

#3 Intel® Xeon® CPU E3-1285 v3, Intel® HD Graphics P4700

RAM

16 GB

OS

Ubuntu 12.04.4 LTS 3.8.0-23-generic

 

Server motherboard chipset for this processor generation must be C226 PCH, because only those server chips are able to work with hardware encoding (for the moment of writing that article).   Also it’s highly recommended to have motherboard without built-in GPU otherwise there could be issues with GPU identification and working.

 

The motherboard that we used had a built-in GPU and that caused us a lot of headache to make things work. Intel Media Server Studio SDK didn’t recognize device ID at first, we couldn't enable Quick Sync Video. After BIOS update the required setting appeared in BIOS, but we still had to manually turn off motherboard’s GPU with a on-board jumper. That configuration blocks IPMI and video output, but we are accessing server via SSH, so that wasn’t a big issue.

 

Note that here are some limitations on Linux kernel version: 3.2.0-41 or 3.8.0-23 for Ubuntu 12.04 and SP3 3.0.76-11 for SUSE Linux Enterprise Server.

 

Results

CPU: E3-1225 V3, 16 GB RAM, Intel® HD Graphics P4600

 

ffmpeg

sample_full_transcode

streambuilder (no optimization)

streambuilder (optimization)

time

8 min 42 s

1 min 19 s

2 min 19 s

1 min 40 s

cpu (max)

750%

55%

125%

50%

mem (max)

3,3%

4,6%

0.5%

0.4%

PSNR

48,107

46,68

 

 

Average PSNR

51,204

49,52

 

 

SSIM

0,99934

0,9956

 

 

MSE

1,623

2,969

 

 

 

 

CPU: I7-3770, 3 GB RAM, Intel® HD Graphics 4000

 

ffmpeg

sample_full_transcode

streambuilder (no optimization)

streambuilder (optimization)

time

8 min 48 s

1 min 24 s

2 min 31 s

1 min 23 s

cpu (max)

750%

19%

150%

45%

mem (max)

18%

20%

2.8%

2.3%

PSNR

48,107

46,495

 

 

Average PSNR

51,204

49,27

 

 

SSIM

0,99934

0,991

 

 

MSE

1,623

3,036

 

 

CPU: E3-1285 v3, 16 GB RAM, Intel® HD Graphics P4700

 

ffmpeg

sample_full_transcode

streambuilder (no optimization)

streambuilder (optimization)

time

8 min 1 s

1 min 11 s

2 min 11 s

1 min 34 s

cpu (max)

750%

55%

130%

55%

mem (max)

3,3%

4,6%

0.5%

0,4%

PSNR

48,107

46,68

 

 

Average PSNR

51,204

49,52

 

 

SSIM

0,99934

0,9956

 

 

MSE

1,623

2,969

 

 

 

StreamBuilder’s signal quality metrics values (PSNR, SSIM, MSE) are equal to sample_full_transcode values so we didn’t show them in the table.

 

As you could see from tables above, server CPUs with Intel HD Graphics P4700/P4600 perform in our test better and give better output video quality than i7-3770, Intel HD Graphics 4000. But that statement is not always correct. Intel keeps improving video encoding with each microchip and SDK versions. Encoding speed could be slightly slower on the latest microchips, but CPU load would be lower too. We have no ideas, why it is that way.

 

Intel HD Graphics P4700 encoding quality was comparable to P4600, but it was 14% faster on E3-1285 v3 with the same resource consumption. Other notable thing is that E3-1285 v3 is faster than E3-1225 v3 by 10% on encoding with ffmpeg.

 

Server with installed StreamBuilder and enabled Quick Sync Video makes possible to encode one input stream to 12 Full HD (1080p) HLS streams or 24 HD HLS streams (720p) or 46 SD HLS streams (480p).

 

Also, optimized memory operations reduced RAM consumption by half.

 

Our initial goal was outperformed for three times! Now we could encode several times more streams on a hardware much cheaper that we used before.

 

You could try out StreamBuilder too, just email us at ask@streambuilder.pro, and we’ll send you a demo distributive.

 

Conclusion

 

The Intel Media Server Studio SDK allows building cost-effective, high-performance encoding/transcoding servers with high stream/rack unit density. While using the tool with StreamBuilder, our versatile media server software that works as a backend for Webcaster.pro, we had initial bumps in the road linking with the motherboard’s GPU, which were eventually solved. The benefits we gained from using the tool were a significant performance boost from much lower CPU usage and low memory consumption — that resulted in additional free resources we could utilize for other tasks (even extra CPU encoding).

For more complete information about compiler optimizations, see our Optimization Notice.