Compression for High-Quality, High Bandwidth Video

 

Introduction

This article provides an introduction to video compression and decompression algorithms, including two popular specifications for video compression, and the handling of video compression in the Intel® Integrated Performance Primitives (Intel® IPP).

Download PDF

Compression for High-Quality, High Bandwidth Video [PDF 2 MB]

Overview of Coding

Image and video encoders and decoders, in software called codecs, are intended to compress their media for storage or transmission. Raw images are quite large; with present technology, raw digital video is almost unworkable. Moreover, working with these media uncompressed, except for capture and display, is completely unnecessary and inefficient with processors as they are. It is faster to read compressed video from disk and decompress it than it would be to read uncompressed video.

Most compression is based on taking advantage of redundancy and predictability in data to reduce the amount of information necessary to represent it. Two common techniques are run-length coding, which converts runs of data into run-lengths and values, and variable-length coding, which converts data of fixed bit lengths into variable bit lengths according to popularity. Huffman coding and arithmetic coding are examples of variable-length coding.

Another source of compression is exploiting the limits of perceptibility. Obviously, for some kinds of data, such as text and binary executables, compression must be lossless. A compression method that sometimes changed an “a” to an “A” would not be acceptable. Stand-alone Huffman coding is exactly reversible. However, it is possible to compress media information in a way that is not exactly reversible but is virtually undetectable. Such methods are called lossy. This means that the output is not guaranteed to be exactly the same. However, in many cases the loss can be imperceptible or have manageable visual effect. Just as with audio coding, the compression algorithm transforms the data into spaces in which information can be removed while minimizing the perceptible impact to the media.

Most media compression is done using transform-based coding methods. Such methods convert the position-based information into frequency-based or position/frequency-based information. The compression benefit is that important information becomes concentrated in fewer values. Then the coder can represent the more-important information with more bits and the less-important information with fewer bits. The perception model dictates the importance of information, but generally higher-frequency information is considered less important.

Figure 1 shows the framework of a transform-based encoding and decoding scheme.

Figure 1 Simple Diagram of Transform-Based Image Coding

Compression schemes for video usually try to take advantage of a second source of redundancy, repetition between frames of video. The coder either encodes raw frames of video or encodes the difference, often compensated for motion, between successive frames.

Coding in Intel® Integrated Performance Primitives (Intel® IPP)

The Intel IPP support of video compression is very similar to that for JPEG, taking several forms. Intel IPP provides portions of codecs and includes samples that are partial codecs for several compression algorithms. In particular, it
includes:

  • General functions such as transforms and arithmetic operations that are applicable across one or more compression algorithms.
  • Specific functions such as Huffman coding for JPEG that you can think of as “codec slices”. At present, Intel IPP provides such functions for MPEG-1, MPEG-2, MPEG-4, DV, H.263, and H.264.
  • Sample encoders and decoders of several major video standards, including MPEG-2, MPEG-4, and H.264.
  • Universal Media Classes (UMC) that wrap these codecs into a platform-neutral video decode and display pipeline.

The subsequent sections will explain each of these elements for two algorithms, MPEG-2 and H.264, and describes and gives examples of UMC. The explanation includes the above categories of support, leaning heavily on examples from the codec samples.

MPEG-2

This section describes the video portion of the MPEG-2 standard.

MPEG-2 is intended for high-quality, high-bandwidth video. It is most prominent because it is used for DVD and HDTV video compression. Computationally, good encoding is expensive but can be done in real time by current processors. Decoding an MPEG-2 stream is relatively easy and can be done by almost any current processor or, obviously, by commercial DVD players.

MPEG-2 players must also be able to play MPEG-1. MPEG-1 is very similar, though the bit stream differs and the motion compensation has less resolution. It is used as the video compression on VCDs.

MPEG-2 is a complicated format with many options. It includes seven profiles dictating aspect ratios and feature sets, four levels specifying resolution, bit rate, and frame rate, and three frame types. The bit stream code is complex and requires several tables. However, at its core are computationally complex but conceptually clear compression and decompression elements. These elements are the focus of this section.

MPEG-2 Components

MPEG-2 components are very similar to those in JPEG. MPEG-2 is DCT based, and uses Huffman coding on the quantized DCT coefficients. However, the bit stream format is completely different, as are all the tables. Unlike JPEG, MPEG-2 also has a restricted, though very large, set of frame rates and sizes. But the biggest difference is the exploitation of redundancy between frames.

There are three types of frames in MPEG: I (intra) frames, P (predicted) frames, and B (bidirectional) frames. There are several consequences of frame type, but the defining characteristic is how prediction is done. Intra frames do not refer to other frames, making them suitable as key frames. They are, essentially, self-contained compressed images. By contrast, P frames are predicted by using the previous P or I frame, and B frames are predicted using the previous and next P or I frame. Individual blocks in these frames may be intra or non-intra, however.

MPEG is organized around a hierarchy of blocks, macroblocks, slices, and frames. Blocks are 8 pixels high by 8 pixels wide in a single channel. Macroblocks are a collection of blocks 16 pixels high by 16 pixels wide and contain all three channels. Depending on subsampling, a macroblock contains 6, 8, or 12 blocks. For example, a YCbCr 4:2:0 macroblock has four Y blocks, one Cb and one Cr.

Following are the main blocks of an MPEG-2 codec, in encoding order. Figure 2 shows how these blocks relate to one another.

Motion Estimation and Compensation

The key to the effectiveness of video coding is using earlier and sometimes later frames to predict a value for each pixel. Image compression can only use a block elsewhere in the image as a base value for each pixel, but video compression can aspire to use an image of the same object. Instead of compressing pixels, which have high entropy, the video compression can compress the differences between similar pixels, which have much lower entropy.

Objects and even backgrounds in video are not reliably stationary, however. In order to make these references to other video frames truly effective, the codec needs to account for motion between the frames. This is accomplished with motion estimation and compensation. Along with the video data, each block also has motion vectors that indicate how much that frame has moved relative to a reference image. Before taking the difference between current and reference frame, the codec shifts the reference frame by that amount. Calculating the motion vectors is called motion estimation and accommodating this motion is called motion compensation.

This motion compensation is an essential and computationally expensive component in video compression. In fact, the biggest difference between MPEG-1 and MPEG-2 is the change from full-pel to half-pel accuracy. This modification makes a significant difference in quality at a given data rate, but also makes MPEG-2 encode very time-consuming.

DCT

Like JPEG, MPEG is DCT-based. The codec calculates a DCT on each 8 x 8 block of pixel or difference information of each image. The frequency information is easier to sort by visual importance and quantize, and it takes advantage of regions of each frame that are unchanging.

Figure 2 High-Level MPEG-2 Encoder and Decoder Blocks


Quantization

Quantization in MPEG is different for different block types. There are different matrices of coefficient-specific values for intra and non-intra macroblocks, as well as for color and intensity data. There is also a scale applied across all matrices. Both the scale and the quantization matrix can change each macroblock.

For intra blocks, the DC, or zero-frequency, coefficient is quantized by dropping the low 0 to 3 bits; that is, by shifting it right by zero to three bits. The AC coefficients are assigned into quantization steps according to the global scale and the matrix. The quantizati on is linear.

For non-intra blocks, the DC component contains less important information and is more likely to tend toward zero. Therefore, the DC and AC components are quantized in the same way, using the non-intra quantization matrix and scale.

Huffman Coding

In order for reduced entropy in the video data to become a reduced data rate in the bit stream, the data must be coded using fewer bits. In MPEG, as with JPEG, that means a Huffman variable-length encoding scheme. Each piece of data is encoded with a code the length of which is inversely related to its frequency. Because of the complexity of MPEG-2, there are dozens of tables of codes for coefficients, block types, and other information.

For intra blocks, the DC coefficient is not coded directly. Instead, the difference between it and a predictor is used. This predictor is either the DC value of the last block if present and intra, or a constant average value otherwise.

Two scan matrices are used to order the DCT coefficients. One does a zig-zag pattern that is close to diagonally symmetric for blocks that are not interlaced; the other does a modified zig-zag for interlaced blocks. These matrices put the coefficients in order of increasing frequency in an attempt to maximize lengths of runs of data.

The encoder codes run-level data for this matrix. Each run-level pair represents the number of consecutive occurrences of a certain level. The more common pairs have codes in a Huffman table. Less common codes, such as runs of more than 31, are encoded as an escape code followed by a 6-bit run and 12-bit level.

MPEG-2 in Intel IPP

Intel IPP provides a very efficient sample encoder and decoder for MPEG-2. Due to the number of variants, it is only a sample and not a compliant codec.

Each side of the codec includes hundreds of Intel IPP function calls. The bulk of the code in the sample is for bit stream parsing and data manipulation, but the bulk of the time is spent decoding the pixels. For this reason, almost all of the Intel IPP calls are concentrated in the pixel decoding blocks. In particular, the key high-level functions are the member functions of the class MPEG2VideoDecoderBase:

 

DecodeSlice_FrameI_420

DecodeSlice_FramePB_420

DecodeSlice_FieldPB_420

DecodeSlice_FrameI_422

DecodeSlice_FramePB_422

DecodeSlice_FieldPB_422

These functions decode the structure of the image, then pass the responsiblility for decoding individual blocks into a function such as ippiDecodeIntra8x8IDCT_MPEG2_1u8u. Figure 2 shows the key portions of two of these functions.

 

Status MPEG2VideoDecoderBase::DecodeSlice_FrameI_420(

IppVideoContext *video)

{

...

DECODE_VLC(macroblock_type, video->bs, vlcMBType[0]);

 

if (load_dct_type) {

GET_1BIT(video->bs, dct_type);

&nbs p; }

 

if (macroblock_type & IPPVC_MB_QUANT)

{

DECODE_QUANTIZER_SCALE(video->bs,

video->cur_q_scale);

}

 

if (PictureHeader.concealment_motion_vectors)

{

if (PictureHeader.picture_structure !=

IPPVC_FRAME_PICTURE) {

SKIP_BITS(video->bs, 1);

}

mv_decode(0, 0, video);

SKIP_BITS(video->bs, 1);

}

 

RECONSTRUCT_INTRA_MB_420(video->bs, dct_type);

}

}//DecodeSlice_FrameI_420

 

#define RECONSTRUCT_INTRA_MB_420(BITSTREAM, DCT_TYPE)

RECONSTRUCT_INTRA_MB(BITSTREAM, 6, DCT_TYPE)

 

#define RECONSTRUCT_INTRA_MB(BITSTREAM, NUM_BLK, DCT_TYPE)

{ ...

for (blk = 0; blk < NUM_BLK; blk++) {

sts = ippiDecodeIntra8x8IDCT_MPEG2_1u8u( ... );

}

}

 

Status MPEG2VideoDecoderBase::DecodeSlice_FramePB_420(

IppVideoContext *video)

{

...

if (video->prediction_type == IPPVC_MC_DP) {

mc_dualprime_frame_420(video);

} else {

mc_frame_forward_420(video);

if (video->macroblock_motion_backward) {

mc_frame_backward_add_420(video);

}

}

} else {

if (video->macroblock_motion_backward) {

mc_frame_backward_420(video);

} else {

RESET_PMV(video->PMV)

mc_frame_forward0_420(video);

}

}

 

if (macroblock_type & IPPVC_MB_PATTERN) {

RECONSTRUCT_INTER_MB_420(video->bs, dct_type);

}

}

 

return UMC_OK;

}//DecodeSlice_FramePB_420

 

void MPEG2VideoDecoderBase::mc_frame_forward0_422(

IppVideoContext *video)

{

MC_FORWARD0(16, frame_buffer.Y_comp_pitch,

frame_buffer.U_comp_pitch);

}

 

#define MC_FORWARD0(H, PITCH_L, PITCH_C)

...

ippiCopy16x16_8u_C1R(ref_Y_data + offset_l, PITCH_L,

cur_Y_data + offset_l, PITCH_L);

ippiCopy8x##H##_8u_C1R(ref_U_data + offset_c, PITCH_C,

cur_U_data + offset_c, PITCH_C);

ippiCopy8x##H##_8u_C1R(ref_V_data + offset_c, PITCH_C,

cur_V_data + offset_c, PITCH_C);

 

#define RECONSTRUCT_INTER_MB_420(BITSTREAM, DCT_TYPE)

RECONSTRUCT_INTER_MB(BITSTREAM, 6, DCT_TYPE)

 

#define RECONSTRUCT_INTER_MB(BITSTREAM, NUM_BLK, DCT_TYPE)

...

for (blk = 0; blk < NUM_BLK; blk++) {

...

sts = ippiDecodeInter8x8IDCTAdd_MPEG2_1u8u(...);

Figure 2 Structure of MPEG-2 Intra Macroblock Decoding

For decoding, two Intel IPP function groups execute most of the decoding pipeline. Between them they implement a large portion of an MPEG-2 decoder, at least for intra blocks.

The first group is ippiReconstructDCTBlock_MPEG2 for non-intra blocks and ippiReconstructDCTBlockIntra_MPEG2 for intra blocks. These functions decode Huffman data, rearrange it, and dequantize it. The source is the Huffman-encoded bit stream pointing to the top of a block and the destination is an 8 x 8 block of consecutive DCT coefficients.

The Huffman decoding uses separate tables for AC and DC codes, formatted in the appropriate Intel IPP Spec structure. The scan matrix argument specifies the zigzag pattern to be used. The functions also take two arguments for the quantization, a matrix and a scale factor. Each element is multiplied by the corresponding element in the quantization matrix, then by the global scale factor.

The function ReconstructDCTBlockIntra also takes two arguments for processing the DC coefficient: the reference value and the shift. The function adds the reference value, which is often taken from the last block, to the DC coefficient. The DC coefficient is shifted by the shift argument, which should be zero to three bits as indicated above.

The second main function is the inverse DCT. The two most useful DCT functions are ippiDCT8x8InvLSClip_16s8u_C1R for intra blocks and ippiDCT8x8Inv_16s_C1R for non-intra blocks. The versions without level-shift and clipping can also be used. This former function inverts the DCT on an 8 x 8 block then converts the data to Ipp8u with a level shift. The output values are pixels. The latter function inverts the DCT and leaves the result in Ipp16s; the output values are difference values. The decoder must then add these difference values to the motion-compensated reference block.

Figure 4 shows these function groups decoding a 4:2:0 intra macroblock. The input is a bit stream and several pre-calculated tables. The DCT outputs the pixel data directly in an image plane. The four blocks of Y data are arrayed in a 2 x 2 square in that image, and the U and V blocks are placed in analogous locations in the U and V planes. This output can be displayed directly by the correct display, or the U and V planes can be upsampled to make a YCbCr 4:4:4 image, or the three planes can be converted by other Intel IPP functions to RGB for display.

 

ippiReconstructDCTBlockIntra_MPEG2_32s(

&video->bitstream_current_data,

&video->bitstream_bit_ptr,

pContext->vlcTables.ippTableB5a,

pContext->Table_RL,

scan_1[pContext->PictureHeader.alternate_scan],

q_scale[pContext->PictureHeader.q_scale_type]

[pContext->quantizer_scale],

video->curr_intra_quantizer_matrix,

&pContext->slice.dct_dc_y_past,

pContext->curr_intra_dc_multi,

pContext->block.idct, &dummy);

 

; ippiReconstructDCTBlockIntra_MPEG2_32s(

pContext->block.idct+64, &dummy);

// Repeat two more times for other Y blocks

ippiReconstructDCTBlockIntra_MPEG2_32s(…)

 

VIDEO_FRAME_BUFFER* frame =

&video->frame_buffer.frame_p_c_n

[video->frame_buffer.curr_index];

 

// Inverse DCT and place in 16x16 block of image

ippiDCT8x8InvLSClip_16s8u_C1R(

pContext->block.idct,

frame->Y_comp_data + pContext->offset_l,

pitch_Y, 0, 0, 255);

ippiDCT8x8InvLSClip_16s8u_C1R(

pContext->block.idct,

frame->Y_comp_data + pContext->offset_l + 8,

pitch_Y, 0, 0, 255);

ippiDCT8x8InvLSClip_16s8u_C1R(

pContext->block.idct,

&nbs p; frame->Y_comp_data + pContext->offset_l + 8*pitch_Y,

pitch_Y, 0, 0, 255);

ippiDCT8x8InvLSClip_16s8u_C1R(

pContext->block.idct,

frame->Y_comp_data +

pContext->offset_l + 8*pitch_Y + 8,

pitch_Y, 0, 0, 255);

ippiReconstructDCTBlockIntra_MPEG2_32s(

&video->bitstream_current_data,

&video->bitstream_bit_ptr,

pContext->vlcTables.ippTableB5b,

pContext->Table_RL,

scan_1[pContext->PictureHeader.alternate_scan],

q_scale[pContext->PictureHeader.q_scale_type]

[pContext->quantizer_scale],

video->curr_chroma_intra_quantizer_matrix,

&pContext->slice.dct_dc_cb_past,

pContext->curr_intra_dc_multi,

pContext->block.idct, &i1);

 

ippiReconstructDCTBlockIntra_MPEG2_32s(

&pContext->slice.dct_dc_cr_past,

pContext->curr_intra_dc_multi,

pContext->block.idct + 64,&i2);

 

ippiDCT8x8InvLSClip_16s8u_C1R (

pContext->block.idct,

frame->U_comp_data + pContext->offset_c,

pitch_UV, 0,0,255);

 

ippiDCT8x8InvLSClip_16s8u_C1R (

pContext->block.idct + 64,

frame->V_comp_data + pContext->offset_c,

pitch_UV, 0,0,255);

Figure 3 Decoding an MPEG-2 Intra Macroblock

The dummy parameter to the first ippiReconstructDCTBlock call is not used here but can be used for optimization. If the value returned is 1, then only the DC coefficient is nonzero and the inverse DCT can be skipped. If it is less than 10, then all the nonzero coefficients are in the first 4 x 4 block, and a 4 x 4 inverse DCT can be used.

The ippiDCT8x8Inv_16s8u_C1R functions could be called instead of the ippiDCT8x8InvLSClip_16s8u_C1R because data is clipped to the 0–255 range by default.

In the non-intra case, the pointer to the quantization matrix can be 0. In that case, the default matrices will be used.

Figure 4 shows another approach to decoding, from the M PEG-2 sample for Intel IPP 5.2. Instead of using the ippiReconstructDCTBlock function for decoding, it implements a pseudo-IPP function called ippiDecodeIntra8x8IDCT_MPEG2_1u8u. This function does almost the entire decoding pipeline, from VL coding through motion compensation.

Within this function, much of the decoding is done within C++, largely using macros and state logic. The Huffman decoding in this sample is done in C++ using macros. The quantization is done in C++, on each sample as it is decoded. The motion compensation is done along with the DCT in one of the DCT macros.

This function calls uses several DCT functions. Most of the DCTs are done by two useful functions, ippiDCT8x8Inv_16s8u_C1R and ippiDCT8x8Inv_16s_C1R for intra blocks and inter blocks, respectively. The former function converts the output to Ipp8u, because for intra blocks those values are pixels. The latter function leaves the result in Ipp16s, because the output values are difference values to be added to the motion-compensated reference block. The sample also uses other DCT function, such as the specialized function ippiDCT8x8Inv_AANTransposed, that assumes that the samples are transposed and in zigzag order, and accommodates implicit zero coefficients at the end. For blocks that are mostly zeros, the decoder also uses the function ippiDCT8x8Inv_4x4_16s_C1.

 

MP2_FUNC(IppStatus, ippiDecodeInter8x8IDCTAdd_MPEG2_1u8u, (

Ipp8u** BitStream_curr_ptr,

Ipp32s* BitStream_bit_offset,

IppiDecodeInterSpec_MPEG2* pQuantSpec,

Ipp32s quant,

Ipp8u* pSrcDst,

Ipp32s srcDstStep))

{

 

// VLC decode & dequantize for one block

for (;;) {

if ((code & 0xc0000000) == 0x80000000) {

break;

} else if (code >= 0x08000000) {

tbl = MPEG2_VLC_TAB1[UHBITS(code - 0x08000000, 8)];

common:

i++;

UNPACK_VLC1(tbl, run, val, len)

 

i += run;

i &= 63; // just in case

j = scanMatrix[i];

 

q = pQuantMatrix[j];

val = val * quant;

val = (val * q) >> 5;

sign = SHBITS(code << len, 1);

APPLY_SIGN(val, sign);

SKIP_BITS(BS, (len+1));

pDstBlock[j] = val;

mask ^= val;

SHOW_HI9BITS(BS, code);

continue;

} else if (code >= 0x04000000) {

...

}

}

 

...

 

pDstBlock[63] ^= mask & 1;

SKIP_BITS(BS, 2);

COPY_BITSTREAM(*BitStream, BS)

 

IDCT_INTER(pDstBlock, i, idct, pSrcDst, srcDstStep);

 

return ippStsOk;

}

 

#define FUNC_DCT8x8 ippiDCT8x8Inv_16s_C1

#define FUNC_DCT4x4 ippiDCT8x8Inv_4x4_16s_C1

#define FUNC_DCT2x2 ippiDCT8x8Inv_2x2_16s_C1

#define FUNC_DCT8x8Intra ippiDCT8x8Inv_16s8u_C1R

#define FUNC_ADD8x8 ippiAdd8x8_16s8u_C1IRS

 

#define IDCT_INTER(SRC, NUM, BUFF, DST, STEP)

if (NUM < 10) {

if (!NUM) {

IDCTAdd_1x1to8x8(SRC[0], DST, STEP);

} else

IDCT_INTER_1x4(SRC, NUM, DST, STEP)

/*if (NUM < 2) {

FUNC_DCT2x2(SRC, BUFF);

FUNC_ADD8x8(BUFF, 16, DST, STEP);

} else*/ { &nb sp;

FUNC_DCT4x4(SRC, BUFF);

FUNC_ADD8x8(BUFF, 16, DST, STEP);

}

} else {

FUNC_DCT8x8(SRC, BUFF);

FUNC_ADD8x8(BUFF, 16, DST, STEP);

}

Figure 4 Alternate MPEG-2 Decoding on an Inter Macroblock

The Intel IPP DCT functions also support an alternative layout for YUV data, a hybrid layout in which there are two planes, Y and UV. The UV plane consists of U and V data interleaved. In this case, there is one 16 x 8 block of UV data per macroblock. The Intel IPP functions ippiDCT8x8Inv_AANTransposed_16s_P2C2R supporting inter frames and ippiDCT8x8Inv_AANTransposed_16s8u_P2C2R for intra frames support this alternative layout. The ippiMC16x8UV_8u_C1 and ippiMC16x8BUV_8u_C1 functions support motion compensation on this layout.

On the encoding side, functions are mostly analogous to each of the decode functions listed above. For intra blocks, the forward DCT function ippiDCT8x8Fwd_8u16s_C1R converts a block of Ipp8u pixels into Ipp16s DCT coefficients. Then the function ippiQuantIntra_MPEG2 performs quantization, and the function ippiPutIntraBlock calculates the run-level pairs and Huffman encodes them. The parameters for these last two functions are very similar to those for their decoding counterparts.

For inter blocks, the function ippiDCT8x8Fwd_16s_C1R converts the difference information into DCT coefficients, the function ippiQuant_MPEG2 quantizes, and the function ippiPutNonIntraBlock calculates and encodes the run-level pairs.

Motion Estimation and Compensation

Motion estimation by the encoder is very computationally intensive, since it generally requires repeated evaluation of the effectiveness of candidate motion compensation vectors. However the possible motion vectors are chosen, using a fast evaluation function speeds up the algorithm. The Intel IPP functions ippiSAD16x16, ippiSqrDiff16x16, and ippiSqrDiff16x16 compare blocks from one frame against motion-compensated blocks in a reference frame. ippiSAD calculates the sum of absolute differences between the pixels, while ippiSqrDiff calculates the sum of squared differences. The Intel IPP sample uses the former.

Once the encoder has finished searching the space of possible motion vectors, it can use the many ippiGetDiff functions to find the difference between the current frame and the reference frame after motion compensation.

Both the encoder and decoder need a motion compensation algorithm. Intel IPP-based algorithms can use ippiMC or ippiAdd to combine the reference frame with the decoded difference information. Figure 6 shows such an algorithm for a macroblock from a 4:2:0 B-frame.

 

// Determine whether shift is half or full pel

// in horizontal and vertical directions

// Motion vectors are in half-pels in bitstream

// The bit code generated is:

// FF = 0000b; FH = 0100b; HF = 1000b; HH = 1100b

flag1 = pContext->macroblock.prediction_type |

((pContext->macroblock.vector[0] & 1) << 3) |

((pContext->macroblock.vector[1] & 1) << 2);

flag2 = pContext->macroblock.prediction_type|

((pContext->macroblock.vector[0] & 2) << 2) |

((pContext->macroblock.vector[1] & 2) << 1);

flag3 = pContext->macroblock.prediction_type|

((pContext->macroblock.vector[2] & 1) << 3) |

((pContext->macroblock.vector[3] & 1) << 2);

flag4 = pContext->macroblock.prediction_type|

&n bsp; ((pContext->macroblock.vector[2] & 2) << 2) |

((pContext->macroblock.vector[3] & 2) << 1);

 

// Convert motion vectors from half-pels to full-pel

// also convert for chroma subsampling

// down, previous frame

vector_luma[1] = pContext->macroblock.vector[1] >>1;

vector_chroma[1] = pContext->macroblock.vector[1] >>2;

 

// right, previous frame

vector_luma[0] = pContext->macroblock.vector[0] >> 1;

vector_chroma[0] = pContext->macroblock.vector[0] >> 2;

 

// down, subsequent frame

vector_luma[3] = pContext->macroblock.vector[3] >> 1;

vector_chroma[3] = pContext->macroblock.vector[3] >> 2;

 

// right, subsequent frame

vector_luma[2] = pContext->macroblock.vector[2] >> 1;

vector_chroma[2] = pContext->macroblock.vector[2] >> 2;

 

 

offs1 =

(pContext->macroblock.motion_vertical_field_select[0] +

vector_luma[1] + pContext->row_l) * pitch_y +

vector_luma[0] + pContext->col_l,

 

offs2 =

; (pContext->macroblock.motion_vertical_field_select[1] +

vector_luma[3] + pContext->row_l) * pitch_y +

vector_luma[2] + pContext->col_l,

 

i = ippiMC16x16B_8u_C1(

ref_Y_data1 + offs1, ptc_y, flag1,

ref_Y_data2 + offs2, ptc_y, flag3,

pContext->block.idct, 32,

frame->Y_comp_data + pContext->offset_l,

ptc_y, 0);

assert(i == ippStsOk);

 

offs1 =

(pContext->macroblock.motion_vertical_field_select[0] +

vector_chroma[1] + pContext->row_c )* pitch_uv +

vector_chroma[0] + pContext->col_c;

 

offs2 =

(pContext->macroblock.motion_vertical_field_select[1] +

&n bsp; vector_chroma[3] + pContext->row_c )* pitch_uv +

vector_chroma[2] + pContext->col_c;

 

i = ippiMC8x8B_8u_C1(

ref_U_data1 + offs1, ptc_uv, flag2,

ref_U_data2 + offs2, ptc_uv, flag4,

pContext->block.idct+256,16,

frame->U_comp_data + pContext->offset_c,

ptc_uv, 0);

assert(i == ippStsOk);

i = ippiMC8x8B_8u_C1(

ref_V_data1 + offs1, ptc_uv,flag2,

ref_V_data2 + offs2, ptc_uv,flag4,

pContext->block.idct+320,16,

frame->V_comp_data + pContext->offset_c,

ptc_uv, 0);

assert(i == ippStsOk);

Figure 5 MPEG-2 Bidirectional Motion Compensation

The first step is to convert the motion vectors from half-pel accuracy to full-pel accuracy, because the half-pel information is passed into the ippiMC functions as a flag. The code drops the least-significant bit of each motion vector and uses it to generate t his flag. The starting point of each reference block is then offset vertically and horizontally by the amount of the motion vector.

Because this code handles bi-directional prediction, the code repeats all these steps for two separate motion vectors and two separate reference frames. This is the last decoding step, so the code places the result directly in the YCbCr output frame.

Color Conversion

The standard Intel IPP color conversion functions include conversions to and from YCbCr 4:2:2, 4:2:0, and 4:4:4. Because they are in the general color conversion set, these functions are called RGBToYUV422 / YUV422ToRGB, RGBToYUV420 / YUV420ToRGB, and RGBToYUV / YUVToRGB. These functions support interleaved and planar YCbCr data. Figure 7 shows a conversion of decoded MPEG-2 pixels into RGB for display.

 

src[0] = frame->Y_comp_data +

pContext->Video[0].frame_buffer.video_memory_offset;

src[1] = frame->V_comp_data +

pContext-Video[0].frame_buffer.video_memory_offset/4;

src[2] = frame->U_comp_data +

pContext->Video[0].frame_buffer.video_memory_offset/4;

srcStep[0] = frame->Y_comp_pitch;

srcStep[1] = pitch_UV;

srcStep[2] = pitch_UV;

 

ippiYUV420ToRGB_8u_P3AC4R(src, srcStep, video_memory +

pContext->Video[0].frame_buffer.video_memory_offset/4,

roi.width<<2, roi);

Figure 6 Converting YCbCr 4:2:0 to RGB for Display

H.264

The two series of video codec nomenclature H.26x and MPEG-x overlap. MPEG-2 is named H.262 in the H.26x scheme. Likewise, another popular codec, H.264, is a subset of MPEG-4 also known as MPEG-4 Advanced Video Coding (AVC). Its intent, like that of all of MPEG-4, was to produce video compression of acceptable quality and very low bit-rate-around half of its predecessors MPEG-2 and H.263.

This section describes the H.264 components and how each of those components is implemented in Intel IPP.

H.264 Components

Like its predecessors in the H.26x line, H.264 has two encoding modes for individual video frames, intra and inter. In the former, a frame of video is encoded as a stand-alone image without reference to other images in the sequence. In the latter, the previous and possibly future frames are used to predict the values. Figure 7 shows the high-level blocks involved in intra-frame encoding and decoding of H.264. Figure 9 shows the encoding and decoding process for inter frames.

The remainder of this section explains each of these blocks, in the order in which the encoder would execute them.

Figure 7 Intra-Mode Encoding and Decoding in H.264

 

Figure 8 Inter–Mode Encoding and Decoding in H.264

Motion Estimation and Compensation

Blocks in H.264, whether in inter or intra frames, can be expressed relative to previous and subsequent blocks or frames. In inter frames, this is called motion estimation and is relative to blocks in other frames. This is the source of considerable compression. As with other video compression techniques, this exploits the fact that there is considerably less entropy in the difference between similar blocks than in the absolute values of the blocks. This is particularly true if the difference can be between a block and a constructed block at an offset from that block in another frame.

H.264 has very flexible support for motion estimation. The estimation can choose from 32 other frames as reference images, and is allowed to refer to blocks that have to be constructed by interpolation.

The encoder is responsible for determining a reference image, block and motion ve

Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.