1. Gross Error Detection Methodology
The Video Gross Error Detection methodology is designed to provide a platform independent, automated way to measure content delivery and quantify the video playback experience.
GED operates directly on the video file itself, instrumenting each frame with visible identifiers. These identifiers are composed of an array of color blocks that encodes an ordinal which represents the intended presentation order. To perform a GED measurement, the video is played through the system under test and the results captured. By examining the capture file, the GED can compare the observed results with the expected ordinal sequence to determine if any frames have been dropped, repeated or presented out of sequence.
1.2 GED Encoding
1.2.1 GED Instrumentation
To instrument the video media for Gross Error Detection, the GED replaces a portion of the video data in each frame. The frame identifier region is a rectangle defined by the upper-left and lower-right corners and may be placed anywhere within the frame. This region contains any array of color blocks (or a single color) that represents an ordinal for each frame, derived from the GED codec. Frame ordinals begin with zero and are incremented by one for each frame. During the decode phase, the GED samples the color blocks within the define region to determine the encoded ordinal.
The GED algorithm creates frame identifiers from combinations of the maximum and minimum values for each channel in the required colorspace.
These values are represented by the corners in a full colorspace diagram:
Values alternate from least significant to most significant channel. For RGB, the channel order is BGR, reflecting the most common byte order for storing this format. For YUV formats, GED uses VUY as the channel order.
For example, for RGB24 the following sequence of 8 values is used:
Each color represents an ordinal in the frame identifier value space. Ordinals increment by one for each frame as described above. For instance, the first frame in the sequence is represented by black, while the third frame is represented by green. Using only fully saturated channels to encode frame identifiers gives the GED the ability to tolerate a wide range of image quality degradation caused by compression, digital to analog conversions and colorspace changes.
If the frame number is larger than the available range of frame identifiers, the identifier selected is the modulus of the frame number and the identifier sequence length, per the following formula:
where o is the GED ordinal, f is the current frame of the source file and s is the size of the GED identifier sequence
Composite frame identifier arrays can be constructed by marking multiple regions with the GED color sequence. Identifiers arrays divide the total requested frame identifier size by the number of submarkers specified. For composite identifiers, the upper-left region represents the least significant portion of the ordinal. Ordinal significance grows from left to right and from top to bottom, with the lower-right region representing the most significant portion of the ordinal. Each region of the frame identifier represents a digit with a number base equal to the size of the GED sequence. For example, RGB24 would be base 8. The following diagram depicts a composite GED frame identifier:
The ordinal represented by a composite identifier is specified according to the following formula:
where o is the GED ordinal, r is the total number of regions that make up the composite identifier, vn is the value of each individual region (left to right and top to bottom), and s is the total size of the individual identifier sequence
Composite identifiers allow larger numbers of unique frame ordinals to be defined. This provides more accurate results for larger video clips. For instance, a sequence instrumented with 8 unique frame identifiers can only detected a maximum of 7 consecutive dropped frames since the 8th frame will repeat the first value.
1.2.2 Frame Identifier Size and Placement
Frame identifiers can be placed anywhere within the frame. The upper-left corner is a reasonable default. However, for certain configurations it is helpful to move the marker away from the corner. For instance, video capture devices can crop or compress the edges of the captured image. New coordinates may need to be used during the analysis stage if the capture process changes the location or size of the frame identifiers.
GED frame identifiers should be 5% of the width of the video clip or greater if digital to analog conversions are part of the system under test. For example, DVD resolution video (720 by 480) should use blocks 36 by 36 pixels or greater. Certain capture devices, such as low-end TV tuner cards, can introduce significant noise and edge compression, so larger blocks may be necessary for these applications.
Keeping the color block much smaller than the video resolution minimizes its effect on the compression codec used. While it is possible to cover the entire frame with the GED pattern, this is not recommended since it does not provide a realistic test. It also makes it difficult to examine the possible content-specific end user impact of a given error. (For example, individual dropped frames may be more noticeable in scenes with fast action or high camera motion.) However, this may be appropriate in certain situations where automatic generation of content is required.
The following diagram depicts the GED instrumentation procedure:
1.2.3 Control Frames
Fo r composite frame identifier sequences, the last ordinal in the high-order region is reserved for special control frames (i.e. white for the RGB24 colorspace.) When a control frame detected, the rest of the identifier is considered an argument that specifies the type of frame and any available options.
GED currently defines a single control frame, the start of sequence identifier. Additional optional control frames are reserved for use in specific GED implementations. The start of sequence identifier is defined only for identifiers at least two regions wide and two regions high. The right-most column should be white, while the remaining columns should alternate from most to least significant color channel per row. The following diagram illustrates a start of sequence identifier for a 3x3 array in the RGB24 colorspace:
When a start of sequence identifier is detected, GED resets the ordinal sequence to zero and increments a counter that indicates the current sequence. Start of sequence markers have two primary functions. The first is to define the region of interest for a capture operation. For instance, additional content may be added to the beginning of a source clip for a video streaming test to allow time to start the capture process. Start markers can be used to flag the beginning of the test content so only this region of the video clip is scored by GED. The following diagram illustrates this process:
Start of sequence identifiers also allow multiple video clips of different types to be concatenated into a single, larger test clip and scored independently as show below:
Using multiple clips of different content types facilitates well-balanced video tests that do not prejudice one system under test over another.
1.2.4 Interlaced Video Handling
GED operations are defined per frame, with each color change occurring on successive frames rather than interlaced fields. When working with interlaced formats, field-level precision is achieved by examining the scanline where the error occurred in a progressive format capture file. This method supports video formats with more than two vertical fields. It also allows the GED to operate identically with progressive and interlaced content and display technologies, as well as facilitating conversions between the two without worrying about the effect on the GED sequence.
1.3 GED Decoding
1.3.1 GED Analysis
GED reads instrumented media, sampling the values within the defined frame identifier region. This media is typically a capture file that records video from the system under test. The sampled pixels are examined by GED to determine which frame identifier (if any) is present in the specified region.
During the analysis process, GED samples the requested region of the test clips for the frame identifier sequence. While some cap ture operations (such as direct writes to disk from display memory) preserve the exact location of the frame identifier, often the coordinates need to be adjusted between the encode and the decode stage.
The GED divides the requested sample region into the required number of sub-regions (9 for a 3x3 array), taking a simple average of the color value in each sub-region:
where v is the sampled color value, xmin is horizontal minimum, xmax is the horizontal maximum, ymin is vertical minimum, ymax is the vertical maximum, and c is a function that returns the color value of the pixel specified by its horizontal and vertical location in the frame buffer
Edge padding around each of the sub-regions can also be specified; typically, a one or two pixel border should be discarded to minimize the effect of blurring or brightness falloff on the average color value. The practical minimum size of each region is defined by the following two formulas:
where xmin is horizontal minimum, xmax is the horizontal maximum, h is the number of horizontal regions and ph is the amount of horizontal padding
where ymin is horizontal minimum, ymax is the horizontal maximum, v is the number of vertical regions and pv is the amount of vertical padding
The following formula specifies pixel sampling with edge padding:
where v is the sampled color value, xmin is horizontal minimum, xmax is the horizontal maximum, ymin is vertical minimum, ymax is the vertical maximum, ph is the amount of horizontal padding, pv is the amount of vertical padding, and c is a function that returns the color value of the pixel specified by its horizontal and vertical location in the frame buffer
Once an average color value has been obtained for each region, the nearest matching ordinal is selected based on the color to ordinal map for the specified color space. For composite sequences, a nearest match operation is performed for each region.
The following diagram depicts the GED analysis procedure:
1.3.2 Error Definitions
GED is capable of detecting three types of frame-level errors: dropped, repeated and out-of-sequence video frames. These errors are termed “gross errors.” Frames may also be defined as “unknown” if they do not fit into any of the previous categories. Errors are defined by the relationship between the decoded ordinal of the current frame the ordinal of the previous frame.
Since ordinals are expected to increment by one for each frame, an ordinal more than one greater indicates dropped frames:
where d is dropped frames, f is the current frame ordinal, and p is the previous frame ordinal
If the current frame’s ordinal is equal to the previous frame’s ordinal, then the frame has been repeated:
where r is repeated frames, f is the current frame ordinal, and p is the previous frame ordinal
Frames that are less than the previous frame, but not by more than the total length of the video sequence are defined as out-of-sequence frames:
where o is out-of-sequence frames, f is the current frame ordinal, p is the previous frame ordinal, and s is the video sequence length
Frames that are greater than the previous frame by more than the total sequence length or less than the current frame by more than the total sequence length are defined as unknown:
where u is unknown frames, f is the current frame ordinal, p is the previous frame ordinal, and s is the video sequence length
Frames may also be marked as unknown for other reasons, for instance for frames processed before a start of sequence marker is detected (see 1.2.3).
1.3.3 Gross Error Detection Workflow
The typical workflow for the Gross Error Detection methodology is depicted in the following diagram:
GED first instruments uncompressed source clips with frame identifiers. Clips are then assembled, scaled and encoded in the format used for the test. The compressed clip is played through the system under test and the results are captured. (The capture process usually outputs another uncompressed file. Sometimes this file remains compressed or is recompressed, as is the case with PVR applications.) GED then processes the resulting capture file to score the test.
1.3.4 Temporal Alignment
Temporal alignment is the process of adding or removing frames from capture files so that they match the length of the source clip. This process is useful as a pre-processing step for video quality tools that do frame-by-frame comparisons, ensuring that the intended frames from the capture file are matched with the correct frames from the source file.
To perform temporal alignment, GED uses the output from the analysis stage to generate a new video file. If dropped frames are detected, copies of the previous frame are inserted into the new file and marked with the ordinals that were found missing. Any repeated frames are simply deleted. Unknown frames at the beginning of the clip (see 1.2.3) are also deleted.
Since dropped frames represent content that was never delivered, there is no way to ensure a perfect video quality score for capture files that have been temporally aligned by GED. Deleting repeated frames typically will have no effect on the quality score.
Temporal alignment can not be accurately performed on capture files that contain out-of-sequence video frames.
2. Video GED Application
Intel has developed the Video Gross Error Detector application as a reference implementation of the GED methodology outlined above.
The Video GED supports instrumentation and error detection using simple frame identifiers or 3x3 color block arrays (high-resolution mode). Start of sequence markers and temporal alignment are supported for high-resolution mode.
The Video GED generates mean opinion scores based on Intel’s research. (MOS scores reflect an expected end user opinion of the quality of the content delivery.) The following table defines the MOS scores generated by the Video GED:
The Video GED can impair source clips with specified numbers of gross errors. This feature is helpful for generating test media for subjective assessments (experiments designed to correlated end user perceptions of playback smoothness with objective GED scores).
The Video GED is provided under a free binary distribution license. While Intel retains ownership of the GED, it may be freely redistributed.
3. GED Advantages
Automated tools are much more efficient than manual review by video experts, which can be time-consuming and expensive. Calibrating the objective metrics to end user opinions allows a tool such as GED to estimate the expected user experience. This greatly facilitates iterative testing during the development process, where small incremental changes in the system under test can be rapidly evaluated. Expert review can be reserved as a final check on major product revisions.
In addition to automating the detection of frame-level errors, GED frame identifiers provide a way to tell otherwise identical frames apart, something that is difficult for reference tools to do. For instance, a sequence of black frames can contain dropped frames that are very difficult for even an expert viewer to detect. It’s also exceedingly difficult to spot dropped or repeated frames in scenes with very low motion or in some types of animation.
3.2 Platform Independence
Since it operates directly on capture files, GED achieves a high level of platform independence. For instance, it can compare the user experience impact of various display technologies, network transports, operating systems, streaming applications, media players and compression formats. Instrumenting video data directly removes any dependence on particular file types, video formats or containers. Content can be transcoded, scaled or captured from analog sources and still be processed by GED.
3.3 Content Independence
The GED methodology is content independent. Since GED always looks for the same frame identifiers regardless of the video under test, it is easy to automate testing of a wide range of video clips. No calibration is required to compare the playback smoothness of clips with one type of content to clips with another type of content.
3.4 Performance and Simplicity of Implementation
GED is significantly faster than reference video quality tools. Frame identifier analysis is an I/O bound process (for uncompressed video). For example, with a small array of 4 drives, standard definition video can be processed at 20x real-time or faster. (A two hour movie would take six minutes.) The GED can quickly determine if the expected content has been delivered properly before subjecting high resolution video clips to more detailed analysis.
4. Download Application
- Video Error Gross Detection [1.2 MB EXE] Updated 12/5/07