Video Frame Display Synchronization

by Dan H. Nowlin,
Intel Corporation

Introduction

Discover how a simple algorithm can help synchronize content frame rate and display refresh rate to improve video playback quality.

Step by step, the digital home is becoming a reality. The last several years have seen increasing numbers of digital home devices become commercially available to consumers. From digital media adapters providing music and video remotely to complete entertainment systems within a single PC, the digital landscape is growing.

Media Center PCs - boasting the ability to watch and record television, as well as save, store, and render digital photos and music - are becoming a standard offering from PC companies. In addition, several vendors are providing kits that allow users to turn their PCs into Media Center PCs.

Unfortunately these Media Center PCs do not always provide high quality video. Several factors contribute to this poor showing - factors such as improper buffering and rendering of streaming content, failure to de-interlace interlaced content, and poor synchronization of video and audio all effect the quality of the video experience. Most of these problems and their solutions are well understood, and many products provide good support for them. However, there is also a less known and more subtle problem that can lead to small but noticeable video hiccups. This paper explains this problem in detail and shows a solution.

More consumers are watching television on their PCs, thanks to the growing numbers of Media Center PCs being sold. As this group expands beyond the hobbyist/enthusiast stage, the demand for better quality video will increase. In order to sustain the growth of the Media Center PC market these video hiccups must be addressed.

Things can be done to improve video playback quality on PCs, and many of the video independent software vendors (ISVs) are doing them. However, an often overlooked aspect of video playback is that software displaying video frames must take into account and synchronize around the refresh cycle of the display device. Whereas televisions synchronize with the video signal received from a broadcast studio, computer monitors refresh at a fixed rate set by the graphics adapter clock, which is totally unrelated to the video input. This major difference can cause a big problem when it comes to properly synchronizing video with a PC monitor. The rest of this paper is dedicated to describing the problem and a solution in detail. Before diving into the problem, some concepts will be introduced to give the reader a firm foundation.


Display Refresh Cycle

A PC monitor’s refresh timing synchronizes with the frequency of the graphics adapter clock. For example, a graphics card and a monitor may both support 60 Hz. This combination works out because the monitor can synchronize with the 60 Hz signal from the graphics card. In fact the monitor will be able to synchronize even when there are minor variations in frequency of the output from the graphics adapter (such as 60.06 Hz vs. 60 Hz).

During the refresh cycle, the display on the PC monitor is redrawn from the current display buffer in the PC graphics adapter&rs quo;s address space. One by one, each horizontal line on the display is updated with the new image from the graphics buffer. The current line being updated is known as the scan line. Since this refresh process is done 60 times per second for a 60 Hz graphics adapter, then the display on the PC monitor is updated 60 times per second as well (the same rate).


Figure 1 – Display Refresh

Tearing Artifact

There is a potential problem with updating the graphics adapter’s display buffer at the wrong time. If the video memory buffer is updated while the monitor is in the middle of being updated (going through its refresh cycle), then only the updated image after the scan line will be redisplayed on the screen in this refresh cycle (See Figure 2). This artifact, where the top part of the screen shows an older image and the bottom part of the screen shows a new image, is known as tearing. This is a pretty descriptive term, because the resulting image can appear torn in two.


Figure 2 – Tearing Artifact


The Flip Command

One way to prevent tearing is to always make sure that the video memory is updated just after one monitor refresh cycle has ended and just before the next cycle begins (in other words, updated during vertical retrace). However, this puts a tremendous burden on software to time the graphic updates very precisely.

For this reason, the Flip command was devised. The Flip command is quite simple - it allows software to update the image any time within the refresh cycle, but the update is not actually delivered to the graphics memory until the current refresh cycle ends. Thus, the image update is performed on the monitor on the next refresh cycle following the Flip command’s execution. Using the Flip method, tearing is never observed because the Flip command guarantees the whole new image is displayed in one refresh cycle (See Figure 3). However, in the next section, we will see that using Flip alone does not eliminate every problem.


Figure 3 – Flip Command Sequence


Potential Problems with Flip

While the Flip construct is a great innovation to easily allow software to prevent the tearing artifact, there is still one potential problem.

When using Flip, the rules that apply to software video rendering are now different. Before Flip, software had to make sure that the timing of the graphics frame update was performed at the proper frame time. However, the monitor’s refresh cycle frequency is now the only clock that the frame display timing can be based on. In other words, a new frame can now be displayed only at the beginning of a re fresh cycle - frame display times are essentially locked in step with the display’s refresh frequency.


Figure 4 –Frame Rate and Display Rate Mismatch

This insight reveals that unless the monitor’s refresh rate exactly matches, or is a multiple of the delivered content’s frame rate, a perfect rendition of the content on the display is not possible. Figure 4 shows an example of this problem. In this scenario, the content frame rate is running slower than the display rate. Because of the phase shift between the two frequencies, the Flip times for two frames will eventually span across a complete refresh cycle (see Flips for frames 3 and 4). This causes frame #3 to be displayed for twice as long as the other frames. This problem demonstrates why it is best to have a perfect “fit” between the frame rate and the display rate, although that is not always possible.

The situation can become much more pronounced when the difference between the frame rate and display rate is small. When the frame times begin to occur very close to the refresh start times, small inaccuracies in the software’s timer can cause several consecutive Flips to stutter back and forth across the refresh start time threshold. This means that some Flips are done too early and some Flips are done too late, resulting in several “over-displayed” and “under-displayed” frames. Figure 5 details this situation - the timer is inaccurate (has jitters) and this causes frames 2 and 4 to not be displayed, and frames 3 and 5 to be displayed twice.


Figure 5 – Flip with inaccurate timer

This case can occur even when the content frame rate and the display refresh rate are the same. Clearly, just a timer and the Flip command are not good enough to ensure quality video. As will be shown in the next section, to handle Flips properly, software must synchronize with the display’s refresh cycle.


Properly Timing Flips

As discussed earlier, using the Flip command brings the display’s refresh cycle squarely into play when timing video frame rendering. Each newly delivered frame is only displayed during one of the display’s complete refresh cycles. So when using the Flip command, software must accurately predict not only when each frame should be displayed, but it must also determine the specific refresh cycle to best meet the frame’s display time.

It is best to execute Flips early in the refresh cycle, immediately preceding the frame’s targeted refresh slot (See Figure 3 for an example). This gives the greatest chance for actually executing the Flip command before the target refresh cycle starts and ensures that the frame is displayed at the correct time. Note in the case where the content frame rate and the display refresh rate are not matched or an even multiple, then the “best-fit” refresh cycle for a particular frame may not produce acceptable video qual ity. There may be ways to perform frame-content generation or altering to get around these problems, but they are beyond the scope of this paper.

Some operating systems expose programming interfaces that allow software applications to synchronize with the display’s refresh cycle. For example, Microsoft’s DirectX* 9.0 contains some useful routines for doing just that. This paper will concentrate on the DirectX routines as an example of the techniques. Readers can use these examples to learn the methods, and then determine what support exists in other operating systems.

WaitForVerticalBlank() is a DirectDraw method (under IDirectDraw) that blocks the calling thread until the beginning of the next refresh cycle. This routine can be used to do synchronization, although it should only be done once or at a very low rate, due to the high overhead of calling this routine. However, this routine is useful for doing the initial synchronization with the refresh cycle.

GetScanLine() is a routine that can be used to retrieve which scan line is currently being updated on the display. If the total number of scan lines and current scan line is known, it is easy to determine where in the refresh cycle the display is. For example, if the total number of scan lines is 1024 and GetScanLine() returns 100, then the current refresh cycle is currently 100 divided by 1024 or about 10 percent complete. Using GetScanLine(), allows an application to track where the refresh cycle is and to use it to determine which refresh cycle to target the next frame for display and to set a timer to set the appropriate flip time. An example algorithm is given below:


Figure 6

Frame times are not chosen based solely on frame rate but by a combination of both the frame rate and the refresh rate. Since the frames can be displayed only at the display refresh times, it is important to target the best fit refresh cycle for each frame. Therefore, it is best to have a refresh rate that exactly matches the display rate. If this were the case, then each and every frame could be drawn at its actual frame time.


Another Option for Recorded Content

While these problems are applicable to all video playback scenarios, both live and recorded content, it may be possible to ease the problem while playing recorded content. If the difference between the content frame rate and the display refresh rate is small, it is possible to adjust the frame rate of the video (and adjust audio accordingly) to match the refresh rate without harming the quality of the content. An example of this is playing back standard definition television at 59.94 frames per second (de-interlaced with the Bob algorithm) played back on a 60 Hz monitor. By speeding up both video and audio to play at the 60 frames per second speed, the frame times match the refresh times and no artifacts occur.


Summary

This paper examines Flip and its intended use to prevent tearing artifacts. It discusses how the use of Flip can cause problems because new frames are displayed only on ref resh cycles of the display. With normal frame timing used to generate Flips, the results are that frame display times and durations differ from what the application expects. Finally, it shows that the proper technique for using Flips is to synchronize them with the display’s refresh and pick a best fit refresh cycle for each frame to be displayed in. Software can then time the Flips appropriately to meet the appropriate refresh cycle. Video quality is definitely best when the content frame rate and the display refresh rate are the same. However, since this rate matching is not always possible, algorithms to minimize the artifacts discussed here should be used.


Additional References

Microsoft DirectX*


Dan N. Nowlin
Insert Author Name
06-19-2009
06-19-2009
Tech Articles
 
 
 
 
 
no
Software displaying video frames must take into account and synchronize around the refresh cycle of the display device. Computer monitors refresh at a fixed rate set by the graphics adapter clock, which is totally unrelated to the video input. This causes problems with synchronizing video with a PC monitor. This article describes a solution using waiting on vertical blanks and adjusting strategy based on the current scan line.
For more complete information about compiler optimizations, see our Optimization Notice.

4 comments

Top
peter-gerdes's picture

No the theoretically optimal solution isn't a callback. Any callback based or flip type based approach limits the apparent frame rate to the CPUs ability to produce frames (in whatever manner surfaces, RGB arrays etc..). Ideally an infinitely fast GPU ought to be able to increase the smoothness of the display above the rate the CPU can push new frames to the GPU. Worse a flip/callback based approach takes up CPU time interpolating frames instead of running application logic or initiating new effects.

The ideal approach would be to pass objects to the video card/driver as 4D surfaces (in the physics sense). That is your surfaces, shading, bump maps and so forth would be augmented with some simple class of time dependent evolution functions. For instance a charachter in the process of lifting his leg would be rendered by passing in some basic surface representing the thigh along with say linear rotation/translation functions of time approximating the position of that surface over a short time period. The bump map on the surface would also be augmented with an additional value giving the rate of change of the height field at each point. The CPU code could be totally ignorant of the actual display rate all it need concern itself with is supplying corrections to the approximate motions it sent to the GPU before they deviate too far from ideal (including making sure the video card/driver is passed new objects before the temporal approximation demands they be materialized).

This has another substantial benefit. With the right choice of display primitives and graphics specification language it likely would allow one to actually extract the O(nlog(n)) performance raytracing seems to offer and improve on the O(n^2) rasterization style algorithms. Raytracing here is understood generally to be an algorithm that uses some kind of spatial subdivision structure to resolve the light incident on ANY point from a given direction and rasterization is also taken generally to be an algorithm some semi-global (no full BSP) operation on objects to compute light incident to a given viewpoint (in particular I would call Carmack's proposal for fully dynamic texture sampling to be a partial raytracing algorithm).

Anyway with all those caveats said the reason raytracing still has performance disadvantages relative to rasterization is that the O(nlogn) complexity assumes your objects are nicely arranged in a BSP or the like which is fine for static objects but fully recomputing the BSP for each scene eats up the raytracing advantage. However, almost all scene changes are continuous modification (or appearence/disappearence) of spatially localized objects (like people, trees, bullets etc..). If these changes were represented as such rather than leaving the GPU to effectively pretend it is starting from stratch when something changes then I strongly suspect that you could use this info to locally update some kind of spatial subdivision structure efficently and realize both the speed and better handling of reflectivity provided by raytracing algorithms.

anonymous's picture

Hi,

Very good article.
What happens when you want to display two or more cameras videos on one display for a video surveillance sytem for example.
In my application when I synchronize swapping on vertical retrace with a multi video system, I divide the display rate by the number of cameras.
For example with four cameras and a 60 Hz LCD display, each video is displayed at 15 Hz.
One solution is to have a process for each video but for other reasons this inappropriate. I have only one process and one thread for the application.
I have a WaitForVerticalBlank() for each video. The first video blocks until the vertical retrace then I display the second video and wait for the next vertical retrace and so on.

Any idea ?

anonymous's picture

With so many graphic cards not able to ouput to TV devices shouldnt there be a simple callback
mechanism to output video? This kind of peek message thing seems childish when all you want is a reliable timer to output DirectX surfaces. You would register your callback to connect to the device output timing. If you delay the output more than a few milliseconds your are unlinked from the callback chain and never called again. So its up to you to put very little code in the callback function. This would stop from someone stopping the output of the display device. Or if you have a better idea? My question to all the wizards out there is why does this ancient problem still exist? CPU's are 3Ghz and we still cant reliably send video to a graphics card. We still dont have timers we can depend on. Its crazy.

anonymous's picture

Hi,

I'm using a link to your excellent article on my blog to describe the importance of rate matching. Hope that's ok.

I've designed a renderer that addresses the problem you describe in two alternative ways: by synchronizing the display refresh rate to the incoming video frame rate or by synchronizing the incoming video frame rate to the display refresh rate. The first alternative is the most attractive from a theoretical point of view as it also works for live sources such as TV. That solution unfortunately doesn't work with all gfx boards and displays so I've also implemented the second one that works for all recorded sources.

Would Intel design a gfx board / driver that would allow me to fine-tune the output refresh rate in a clean way from my application, I (and a few more enthusiasts) would become an Intel customer for ever :-)

For an intro to my player see http://www.ostrogothia.com/video/?page_id=5.

Cheers!

Arto

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.