MP4/AVC Decode using the Intel® Media SDK

Download Article

Download MP4/AVC Decode using the Intel® Media SDK [PDF 466KB]

Download Source [Zipfile 13.4MB]

Introduction

The Intel® Media Software Development Kit (Intel® Media SDK) is a great way for developers to reduce the amount of time and effort it takes to create media applications. The complexity of today's formats combined with what it takes to leverage the hardware for the decoding or encoding can quickly bog down even the most productive developer.

The MP4 file format (ISO/IEC 14496-14:2003) is a multi-media container format that is commonly used to store digital video and audio streams. This whitepaper describes the process of decoding MP4 files using the Intel Media SDK. The following code examples will build upon the existing DirectShow decode sample filter that ships with the Intel Media SDK. The popular open source application Media Player Classic will be used to load the new filter and manage the playback.

The Intel Media SDK's decode filter sample is a DirectShow "transformation" filter derived from the CTransformFilter base class. This class contains one input pin and one output pin which are used to connect to neighboring filters. The sample decode filter accepts compressed video streams on its input pin, decodes them using the Intel Media SDK, and then outputs uncompressed frames to the next filter via its output pin. Filters communicate across pins by proposing Media Types which describe the format of the data. The presence of a Media type allows filters to establish an agreed to type and thus form processing chains of filters all working together to process multimedia files.

Media Types are frequently referred to by their FourCC codes, which are a sequence of four bytes used to identify data formats. Microsoft defines as many as five different FourCC codes to represent H.264 video data. The following are defined by Microsoft:
 

Subtype FOURCC Description
MEDIASUBTYPE_AVC1 'AVC1' H.264 bitstream without start codes.
MEDIASUBTYPE_H264 'H264' H.264 bitstream with start codes.
MEDIASUBTYPE_h264 'h264' Equivalent to MEDIASUBTYPE_H264, with a different FOURCC.
MEDIASUBTYPE_X264 'X264' Equivalent to MEDIASUBTYPE_H264, with a different FOURCC.
MEDIASUBTYPE_x264 'x264' Equivalent to MEDIASUBTYPE_H264, with a different FOURCC.


While all these media subtypes carry h.264 video data, they differ by the presence (or absence) of the start codes that are embedded within the data. Bitstreams containing a start code sequence are described in Annex B of the ITU-Rec. H.264 specification. The bitstream is comprised of a sequence of network abstraction layer units (NALU's) and each NALU packet is preceded with a start code equal to 0x0000001. When start codes are present in the bitstream, the upstream filter typically sends the following Media Types:

Major type MEDIATYPE_Video
Subtypes MEDIASUBTYPE_H264, MEDIASUBTYPE_h264, MEDIASUBTYPE_X264, or MEDIASUBTYPE_x264
Format type FORMAT_VideoInfo, FORMAT_VideoInfo2, FORMAT_MPEG2Video, or GUID_NULL


The Intel Media SDK requires H.264 data formatted as defined as Annex B type bitstreams, and thus the current version of the sample decode filter only supports the above media subtypes.

On the other hand, the MP4 container format stores data without start codes. Instead of a start code, each NALU packet is prefixed with the length of the NAL unit in bytes. The size of the length field is typically 1, 2, or 4 bytes. When start codes are not present within the bitstream, the upstream filter sends the following Media types:

Major type MEDIATYPE_Video
Subtype MEDIASUBTYPE_AVC1
Format type FORMAT_MPEG2Video


Some splitter filters will convert the MP4 bitstream into Annex B style before sending the data to the decoder,  and some do not.  The splitter loaded by Media Player Classic did not. To successfully decode a MP4 bitstream using the Intel Media SDK the decode filter must manipulate the incoming data and reformat it into an Annex B style bitstream prior to sending it to the SDK's decode engine. The focus of this whitepaper details the steps necessary to augment the existing sample code to support the AVC1 subtype.

Filter Negotiation

Negotiation is the process where two neighboring filters agree on the media type that they will act upon. Filters connecting to the decoder's input pin are referred to as the "upstream filter", whereas filters that connect to the decoder's output pin are considered the "downstream filter". In a simple playback scenario, the upstream filter is typically a "splitter" filter that is responsible for separating the audio and video data, and the downstream filter is typically the "renderer" that display the content.

The following illustrates a completed filter graph:



The negotiation process begins with the upstream filter sending a CMediaType structure to the decoder's CheckInputType function, which simply checks the structure's MEDIASUBTYPE member to see if it contains a format that is supported.

MediaSubTypes are actually GUIDs, and the following were defined to support AVC in the file "Mfx_filter_Guid.h".

DEFINE_GUID(MEDIASUBTYPE_avc1,
 0x31637661,0x0000,0x0010,0x80,0x00,0x00,0xAA,0x00,0x38,0x9B,0x71);
 
DEFINE_GUID(MEDIASUBTYPE_AVC1,
0x31435641, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xAA, 0x00, 0x38, 0x9B, 0x71);


Note: The subtype "avc1" and "AVC1" are synonymous. Media Player Classic's Home Cinema's splitter filter sends the lower case (avc1) GUID, whereas other players send the upper case (AVC1). The sample code that accompanies this paper will support both GUIDs.


The Intel Media SDK H264 decode sample is defined as a derived class from CDecVideoFilter, thus allowing multiple decoders (H.264, VC/1, and MPEG2) to reuse much of their code.

HRESULT CH264DecVideoFilter::CheckInputType(const CMediaType *mtIn)
{
    CHECK_POINTER(mtIn, E_POINTER);
 
    if (MEDIASUBTYPE_H264 == *mtIn->Subtype())
    {
    }
    else if (MEDIASUBTYPE_AVC1 == *mtIn->Subtype()  || MEDIASUBTYPE_avc1 == *mtIn->Subtype())
    {
        SAFE_DELETE(m_pFrameConstructor);
        m_pFrameConstructor = new CAVCFrameConstructor;
    }
    else
    {
        return VFW_E_INVALIDMEDIATYPE;
    }
 
    return CDecVideoFilter::CheckInputType(mtIn);
};

The above code interrogates the proposed MediaType, and if it detects the type is "AVC1" then a new class "CAVCFrameConstructor" is created. The CAVCFrameConstructor class is responsible for formatting the bitstream data prior to sending it to the decoder. Its functionally will be explained below.

After the derived h.264 decoder has completed CheckInputType, the base class's method is called.

HRESULT CDecVideoFilter::CheckInputType(const CMediaType *mtIn)
{
…
 
 //AVC Format MAY sends bitstream SPS/PPS to the decoder via	MPEG2VIDEOINFO   
 //(out of band) 
       //Use the FrameConstructor Class to parse the data, then use for MediaSDK's   
       //DecodeHeader()
 
if((MAKEFOURCC('a','v','c','1') == vih2.bmiHeader.biCompression) &&  (FORMAT_MPEG2_VIDEO == guidFormat))   
{                
MPEG2VIDEOINFO *mp2 = reinterpret_cast<MPEG2VIDEOINFO *> (mtIn->pbFormat); 
CHECK_POINTER(mp2, E_UNEXPECTED); 

	if(mp2->cbSequenceHeader > 0 ) 
    { 
ZeroMemory(&avcSPS_PPS, sizeof(mfxBitstream)); 
CAVCFrameConstructor* pAVCConstructor =    
    dynamic_cast<CAVCFrameConstructor*>(m_pFrameConstructor);
sts = pAVCConstructor->ReadAVCHeader(mp2, &avcSPS_PPS);        
if(MFX_ERR_NONE == sts) 
 {
 		       sts = m_pDecoder->DecodeHeader(&avcSPS_PPS, ¶ms); 
 }
    }
}
 
}

The code above does not represent the entire CDecVideoFilter::CheckInputType() function. Only the code required to support AVC1 is included.

When the upstream filter queries the decoder to connect with AVC1, the Media Type will contain a buffer which is formatted as a MPEG2VIDEOINFO structure. This structure contains key elements needed by the Media SDK to initialize the decoding environment. A pointer to this data can be obtained by casting the Media type's format block into a MPEG2VIDEOINFO type, and then the key elements can be extracted.

The MP4 container may (or may not) contain one or more sequence parameter or picture parameter sets (SPS and/or PPS). SPS and PPS packets contain configuration parameters about the media, and typically these values are part of the bitstream. They are read "in band" right at the start of the decoding process. AVC1 is however slightly different, whereas the SPS and PPS data is sent "out of band" within the MPEG2VIDEOINFO structure and not contained within the bitstream itself. The MPEG2VIDEOINFO.dwSequeceHeader is actually a series of NAL packets that need to be parsed, stored, and then appended on the front of the bitstream before decoding can begin.

The above code again makes use of the CAVCFrameconstructor class to build an auxiliary bitstream with the SPS/PPS data. We will look at CAVCFrameConstructor in the next section of this paper.

Once, the SPS and/or PPS data has been read from the MPEG2VIDEOINFO structure, and reformatted into an Annex B style bitstream it can be passed into the Media SDK's DecodeHeader function. DecodeHeader's purpose is to fill in the mfxVideoParam structure with appropriate values so the Media SDK's initialization function can be called.

Frame Constructor Class

In the Intel Media SDK's sample decode filter, the CFrameConstructor class is responsible for formatting the bitstream prior to sending the data to the decode engine. For MP4 support, the new class CAVCFrameConstructor was derived from the base CFrameConstructor class to handle the additional steps required with this container format. Two functions handle the processing of MP4 data: CAVCFrameConstructor::ReadAVCHeader() and CAVCFrameConstructor::ContructFrame() both of these classes rely on the auxiliary class StartCodeIteratorMP4 to navigate the length prefixed NAL packets. In addition, the new derived class rely on functions of the base class to generate empty bitstreams with start codes, and to manage overflow data between decode calls.

The purpose of ReadAVCHeader() is to gather the SPS and/or PPS data and format it into a bitstream format that can be passed to the Intel Media SDK for processing. As stated previously, the SPS/PPS data is not contained within the bitstream itself but rather contained in the MEPG2VIDEOINFO structure sent during filter negotiation. It's important to note that NAL packets for header data are fixed at 2 bytes (as defined by Microsoft), whereas when processing actual bitstream data the NAL packets could be 1, 2, or 4 bytes. The variable bit length of the bit stream NAL packets is also conveyed in the MPEG2VIDEOINFO structure, and the following function also saves this (m_NalSize) for future use.

Once the data is saved, the StartCodeIteratorMP4 class is used to increment through the data and parse each NAL packet. This is performed in a loop until all NAL packets have been read.

mfxStatus CAVCFrameConstructor::ReadAVCHeader( MPEG2VIDEOINFO *pMPEG2VidInfo,   mfxBitstream  *pBS )
{
 
mfxStatus sts = MFX_ERR_NONE; 
MPEG2VIDEOINFO mp2VidInfo; 
std::vector<mfxU8>      tempBuffer;
mfxU32 nNalDataLen;                  
mfxU8* pNalDataBuff; 
StartCodeIteratorMP4    m_pStartCodeIter;
 
ZeroMemory(&m_Headers, sizeof(mfxBitstream));                                         
 
m_NalSize = pMPEG2VidInfo->dwFlags;         
 
m_pStartCodeIter.Init((BYTE *)pMPEG2VidInfo->dwSequenceHeader, pMPEG2VidInfo->cbSequenceHeader, m_HeaderNalSize ); //Nal size = 2 
while (m_pStartCodeIter.ReadNext())
{        
nNalDataLen = m_pStartCodeIter.GetDataLength(); 
pNalDataBuff = m_pStartCodeIter.GetDataBuffer();
 
switch(m_pStartCodeIter.GetType())
{
case NALU_TYPE_SPS:
case NALU_TYPE_PPS: 
tempBuffer.insert(tempBuffer.end(), m_StartCodeBS.Data, m_StartCodeBS.Data + 4);
tempBuffer.insert(tempBuffer.end(), pNalDataBuff, pNalDataBuff+nNalDataLen);
break; 
default: 
sts = MFX_ERR_MORE_DATA; 
break;
}
}
    if (tempBuffer.size())
{
 pBS->Data = new mfxU8[tempBuffer.size()];
 pBS->DataLength = pBS->MaxLength = (mfxU32)tempBuffer.size();
 memcpy(pBS->Data, &tempBuffer.front(), tempBuffer.size());
 
 //Keep a copy of the SPS/PPS to place put into the  decode stream. 
 m_Headers.Data = new mfxU8[tempBuffer.size()];
 m_Headers.DataLength = m_Headers.MaxLength = (mfxU32)tempBuffer.size(); 
 memcpy(m_Headers.Data, &tempBuffer.front(), tempBuffer.size()); 
 tempBuffer.clear();
}
 
return sts; 
}

As each NAL packet is identified, the offset to the actual data location is stored. This is the location where the existing NAL length data needs to be replaced with a start code. The insertion of the start code is achieved by using a temporary buffer to reformat the bitstream. When the CFrameConstructor base class was instantiated, a simple bitstream was created with an empty start code as its only data and stored in m_StartCodeBS.data. This bitstream is then reused every time a new start code needs to be inserted. After the loop has processed, the temporary buffer should look like 0x0000001(SPS Data) 0x0000001 (PPS Data), etc.

The newly formatted bitstream is now needed in two places. First the data needs to be used to initialize the decoder via the DecodeHeader() as seen above in "Filter Negotiation", and it also needs to be passed to the Decode() function when first starting the decode. The ReadAVCHeader saves the header bitstream off to a member function m_Header for future use.

The next other function needed to implement MP4 playback is CAVCFrameConstructor::ConstructFrame(). This function is very similar to the CAVCFrameConstructor:: ReadAVCHeader() in that it loops through the incoming bitstream detecting NAL packets and replacing them with start codes. There are two differences to ReadAVCHeader() worth mentioning. First, the ConstrutFrame() function will insert the SPS/PPS data into the bitstream during the first frame. The Decode function needs these values to begin decoding subsequent frames.

ConstructFrame() also manages the residual data that's not consumed by the prior calls to decode. If ConstructFrame() detects residual data is present, it's added into the bitstream prior first. This behavior is identical to non-AVC data streams.

mfxStatus CAVCFrameConstructor::ConstructFrame(IMediaSample *pSample, mfxBitstream *pBS)
{
mfxStatus sts = MFX_ERR_NONE; 
mfxU32 nDataSize = 0; 
mfxU8*                  pDataBuffer = NULL;        
std::vector<mfxU8>      tempBuffer;
REFERENCE_TIME          rtStart(0), rtEnd(0);
StartCodeIteratorMP4    m_pStartCodeIter;
mfxU32 nNalDataLen; 
mfxU8* pNalDataBuff; 
 
 
CHECK_POINTER(pSample, MFX_ERR_NULL_PTR); 
CHECK_POINTER(pBS, MFX_ERR_NULL_PTR); 
 
nDataSize = pSample->GetActualDataLength(); 
if( 0 == nDataSize)
{
sts = MFX_ERR_MORE_DATA; 
}
 
    if (MFX_ERR_NONE == sts)
    {
        pSample->GetPointer(&pDataBuffer);
        CHECK_POINTER_SET_STS(pDataBuffer, sts);
    }
 
 
 
if (MFX_ERR_NONE == sts)
    {         
 
m_pStartCodeIter.Init (pDataBuffer, nDataSize, m_NalSize); //Nal size = 4 
while (m_pStartCodeIter.ReadNext())
{                                          
nNalDataLen =  m_pStartCodeIter.GetDataLength(); 
pNalDataBuff = m_pStartCodeIter.GetDataBuffer();
 
tempBuffer.insert(tempBuffer.end(), m_StartCodeBS.Data, m_StartCodeBS.Data + 4);
tempBuffer.insert(tempBuffer.end(), pNalDataBuff, pNalDataBuff+nNalDataLen);
}
 
 
 if (tempBuffer.size())
 {
 if(m_Headers.DataLength > 0) 
 { 
 pBS->Data = new mfxU8[tempBuffer.size() + m_mfxResidialBS.DataLength + m_Headers.DataLength];
 pBS->DataLength = pBS->MaxLength = (mfxU32)tempBuffer.size() + m_mfxResidialBS.DataLength + m_Headers.DataLength;
 
 }
 else 
 {
 pBS->Data = new mfxU8[tempBuffer.size() + m_mfxResidialBS.DataLength];
 pBS->DataLength = pBS->MaxLength = (mfxU32)tempBuffer.size() + m_mfxResidialBS.DataLength;
 }
 
  if (m_mfxResidialBS.DataLength)
{
    memcpy(pBS->Data, m_mfxResidialBS.Data, m_mfxResidialBS.DataLength);
    }
 
 if(m_Headers.DataLength > 0) 
 {
 memcpy(pBS->Data, m_Headers.Data, m_Headers.DataLength); 
 memcpy(pBS->Data + m_mfxResidialBS.DataLength + m_Headers.DataLength, &tempBuffer.front(), tempBuffer.size());
 m_Headers.DataLength = 0; 
 m_Headers.Data = NULL;  
 }
 else
 {
                memcpy(pBS->Data + m_mfxResidialBS.DataLength, &tempBuffer.front(), tempBuffer.size());
 }
 
                tempBuffer.clear();
                m_mfxResidialBS.DataLength = 0;
 }
 
 if (MFX_ERR_NONE == sts)
{            
pSample->GetTime(&rtStart, &rtEnd);        
pBS->TimeStamp = ConvertReferenceTime2MFXTime(rtStart);                
}
 
}
return sts; 
 
}

Conclusion

H.264 video data packaged in the MP4 container format needs to be reformatted into Annex "B" style bitstreams for use with the Intel® Media SDK. This whitepaper has demonstrated one possible way to reformat the data using the Intel Media SDK's own DirectShow sample filter as an example. Adding this support into the sample filter allows more interoperability with today's media players such as Media Player Classic Home Cinema.

Download Source [Zipfile 13.4MB]

References

[1] Media Player Classic Home Cinema: http://mpc-hc.sourceforge.net/
[2] ITU-T Advanced video coding for generic audiovisual services. ITU-T Recommendation H.264 (5/2003)
[3] DirectX Video Acceleration Specification for H.264/AVC Decoding - 12/14/2007

About the Author

Eric Sardella is a senior software engineer in the Software and Services Group at Intel. He has been with Intel for 12 years, and spent most of the time working with graphics. Eric has a BS in Computer Science from California State University, Chico. He is a husband, dad, gamer, fly fisher, and racquetball player, who is currently building his own house in scenic Newcastle, CA.

Для получения подробной информации о возможностях оптимизации компилятора обратитесь к нашему Уведомлению об оптимизации.