Archived - Chat Heads with Intel® RealSense™ SDK Background Segmentation Boosts e-Sport Experience

The Intel® RealSense™ SDK has been discontinued. No ongoing support or updates will be available.

Download Code Sample

Intel® RealSense™ Technology can be used to improve the e-sports experience for both players and spectators by allowing them to see each other on-screen during the game. Using background segmented video (BGS), players’ “floating heads” can be overlaid on top of a game using less screen real-estate than full widescreen video and graphics can be displayed behind them (like a meteorologist on television). Giving players the ability to see one another while they play in addition to speaking to one another will enhance their overall in-game communication experience. And spectators will get a chance to see their favorite e-sports competitors in the middle of all the action.

In this article, we will discuss how this technique is made possible by the Intel® RealSense™ SDK. This sample will help you understand the various pieces in the implementation (using the Intel RealSense SDK for background segmentation, networking, video compression and decompression), the social interaction, and the performance of this use case. The code in this sample is written in C++ and uses DirectX*.

Figure 1:Screenshot of the sample with two players with a League of Legends* video clip playing in the background.

Figure 2:Screenshot of the sample with two players and a Hearthstone* video clip playing in the background.

Installing, Building, and Running the Sample

Download the sample at:

The sample uses the following third-party libraries: 
(i) RakNet for networking 
(ii) Theora Playback Library to play back ogg videos 
(iii) ImGui for the UI 
(iv) Windows Media Foundation* (WMF) for encoding and decoding the BGS video streams

(i) and (ii) are dynamically linked (required dll’s are present in the source repo), while (iii) is statically linked with source included.
(iv) is dynamically linked, and is part of the WMF runtime, which should be installed by default on a Windows* 8 or greater system. If it is not already present, please install the Windows SDK.

Install the Intel RealSense SDK (2015 R5 or higher) prior to building the sample. The header and library include paths in the Visual Studio project use the RSSDK_DIR environment variable, which is set during the RSSDK installation.

The solution file is at ChatheadsNativePOC\ChatheadsNativePOC and should build successfully with VS2013 and VS2015.

Install the Intel® RealSense™ Depth Camera Manager, which includes the camera driver, before running the sample. The sample has been tested on Windows® 8.1 and Windows® 10 using both the external and embedded Intel® RealSense™ cameras.

When you start the sample, the option panel shown in Figure 3 displays:

Figure 3:Option panel at startup.

  • Scene selection: Select between League of Legends* video, Hearthstone* video and a CPUT (3D) scene. Click the Load Scene button to render the selection. This does not start the Intel RealSense software; that happens in a later step.
  • Resolutions: The Intel RealSense SDK background segmentation module supports multiple resolutions. Setting a new resolution results in a shutdown of the current Intel RealSense SDK session and initializes a new one.
  • Is Server / IP Address: If you are running as the server, check the box labeled Is Server. 
    If you are running as a client, leave the box unchecked and enter the IP address you want to connect to.

    Hitting Start initializes the network and Intel RealSense SDK and plays the selected scene. The maximum number of connected machines (server plus client(s)) is hardcoded to 4 in the file NetworkLayer.h

    Note: While a server and client can be started on the same system, they cannot use different color stream resolutions. Attempting to do so will crash the Intel RealSense SDK runtime since two different resolutions can’t run simultaneously on the same camera.

    After the network and Intel RealSense SDK initialize successfully, the panels shown in Figure 4 display:

Figure 4:Chat Heads option panels.

The Option panel has multiple sections, each with their own control settings. The sections and their fields are:

  • BGS/Media controls
    • Show BGS Image – If enabled, the background segmented image (i.e., color stream without the background) is shown. If disabled, the color stream is simply used (even though BGS processing still happens). This affects the remote chat heads as well (that is, if both sides have the option disabled, you’ll see the remote players’ background in the video stream).

      Figure 5:BGS on (left) and off (right). The former blends into Hearthstone*, while the latter sticks out

    • Pause BGS - Pause the Intel RealSense SDK BGS module, suspending segmentation processing on the CPU
    • BGS frame skip interval - The frequency at which the BGS algorithm runs. Enter 0 to run every frame, 1 to run once in two frames, and so on. The limit exposed by the Intel RealSense SDK is 4.
    • Encoding threshold – This is relevant only for multiplayer scenarios. See the Implementation section for details.
    • Decoding threshold - This is relevant only for multiplayer scenarios. See the Implementation section for details.
  • Size/Pos controls
    • Size - Click/drag within the boxes to resize the sprite. Use it with different resolutions to compare quality.
    • Pos - Click/drag within the boxes to reposition the sprite.
  • Network control/information (This section is shown only when multiple players are connected)
    • Network send interval (ms) - how often video update data is sent.
    • Sent - Graph of data sent by a client or server.
    • Rcvd - Graph of data received by a client or server. Clients send their updates to the server, which then broadcasts it to the other clients. For reference, to stream 1080p Netflix* video, the recommended b/w required is 5 Mbps (625 KB/s).
  • Metrics
    • Process metrics
      • CPU Used - The BGS algorithm runs on several Intel® Threading Building Blocks threads and in the context of a game, can use more CPU resources than desired. Play with the Pause BGS and BGS frame skip interval options and change the Chat Head resolution to see how it affects the CPU usage.


Internally, the Intel RealSense SDK does its processing on each new frame of data it receives from the Intel RealSense camera. The calls used to retrieve that data are blocking, making it costly to execute this processing on the main application thread. Therefore, in this sample, all of the Intel RealSense SDK processing happens on its own dedicated thread. This thread and the application thread never attempt to write to the same objects, making synchronization trivial.

There is also a dedicated networking thread that handles incoming messages and is controlled by the main application thread using signals. The networking thread receives video update packets and updates a shared buffer for the remote chat heads with the decoded data.

The application thread takes care of copying the updated image data to the DirectX* texture resource. When a remote player changes the camera resolution, the networking thread sets a bool for recreation, and the application thread takes care of resizing the buffer, recreating the DirectX* graphics resources (Texture2D and ShaderResourceView) and reinitializing the decoder.

Figure 6 shows the post-initialization interaction and data flow between these systems (threads).

Figure 6: Interaction flow between local and remote Chat Heads.

Color Conversion

The Intel RealSense SDK uses 32-bit BGRA (8 bits per channel) to store the segmented image, with the alpha channel set to 0 for background pixels. This maps directly to the DirectX texture format DXGI_FORMAT_B8G8R8A8_UNORM_SRGB for rendering the chat heads. In this sample, we convert the BGRA image to YUYV, wherein every pair of BGRA pixels is combined into one YUYV pixel. However, YUYV does not have an alpha channel, so to preserve the alpha from the original image, we set the Y, U, and V channels all to 0 in order to represent background segmented pixels.

The YUYV bit stream is then encoded using WMF’s H.264 encoder. This also ensures better compression, since more than half the image is generally comprised of background pixels.

When decoded, the YUYV values meant to represent background pixels can be non-zero due to the lossy nature of the compression. Our workaround is to use 8 bit encoding and decoding thresholds, exposed in the UI. On the encoding side, if the alpha of a given BGRA pixel is less than the encoding threshold, then the YUYV pixel will be set to 0. Then again, on the decoding side, if the decoded Y, U, and V channels are all less than the decoding threshold, then the resulting BGRA pixel will be assigned an alpha of 0.

When the decoding threshold is set to 0, you may notice green pixels (shown below) highlighting the background segmented image(s). This is because in YUYV, 0 corresponds to the color green and not black as in BGRA (with non-zero alpha). When the decoding threshold is set to 0, you may notice green pixels (shown below) highlighting the background segmented image(s). This is because in YUYV, 0 corresponds to the color green and not black as in BGRA (with non-zero alpha).

Figure 7: Green silhouette edges around the remote player when a 0 decoding threshold is used


The amount of data sent over the network depends on the network send interval and local camera resolution. The maximum send rate is limited by the 30 fps camera frame rate, and is thus 33.33 ms. At this send rate, a 320x240 resolution video feed consumes 60-75 KBps with minimal motion (kilobytes per second) and 90-120 KBps with more motion. Note that the bandwidth figures depend on the number of pixels covered by the player. Increasing the resolution to 1280x720 doesn’t impact the bandwidth cost all that much; the net increase is around 10-20 KBps, since a sizable chunk of the image is the background (YUYV set to 0) which results in much better compression.
Reducing the send interval to 70ms results in a bandwidth consumption of ~20-30 KBps. 


The sample uses Intel® Instrumentation and Tracing Technology (Intel® ITT) markers and Intel® VTune Amplifier XE to help measure and analyze performance. To enable them, uncomment

//#define ENABLE_VTUNE_PROFILING // uncomment to enable marker code

In the file


and rebuild.

With the instrumentation code enabled, an Intel® VTune concurrency analysis of the sample can help understand the application’s thread profile. The platform view tab shows a colored box (whose length is based on execution time) for every instrumented section, and can help locate bottlenecks. The following capture was taken on an Intel® Core™ i7-4770R processor (8 logical cores) with varying BGS work. The “CPU Usage” row on the bottom shows the cost of executing the BGS algorithm every frame, every alternate frame, once in three frames and when suspended. As expected, the TBB threads doing the BGS work have lower CPU utilization when frames are skipped.

Figure 8: VTune concurrency analysis platform view with varying BGS work

A closer look at the RealSense thread shows the RSSDK AcquireFrame() call taking ~29-35ms on average, which is a result of the configured frame capture rate of 30 fps.

Figure 9: Closer look at the RealSense thread. The thread does not spin, and is blocked while trying to acquire the frame data

The CPU usage info can be seen via the metrics panel of the sample as well, and is shown in the table below:

BGS frequencyChat Heads CPU Usage (approx.) 
Every frame23%
Every alternate frame19%
Once in three frames16%
Once in four frames13%

Doing the BGS work every alternate frame, or once in three frames, results in a fairly good experience when the subject is a gamer because of minimal motion. The sample currently doesn’t update the image for the skipped frames – it would be interesting to use the updated color stream with the previous frame’s segmentation mask instead.


The Chat Heads usage enabled by Intel RealSense technology can make a game truly social and improve both the in-game and e-sport experience without sacrificing the look, feel and performance of the game. Current e-sport broadcasts generally show full video (i.e., with the background) overlays of the professional player (and/or) team in empty areas on the bottom UI. Using the Intel RealSense SDK's background segmentation, each players’ segmented video feed can be overlaid near the player’s character, without obstructing the game view. Combined with Intel RealSense SDK face tracking, it allows for powerful and fun social experiences in games. 


A huge thanks to Jeff Laflam for pair-programming the sample and reviewing this article. 
Thanks also to Brian Mackenzie for the WMF based encoder/decoder implementation, Doug McNabb for CPUT clarifications and Geoffrey Douglas for reviewing this article.


For more complete information about compiler optimizations, see our Optimization Notice.


Kiriakos T.'s picture

The camera sensor of 3D F200 seems to capture the person if he is really close to the camera. Is there any way to icrease the distance between the camera and the person without causing any problems to capture?

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.