NAL unit types produced by the H264 encoder

NAL unit types produced by the H264 encoder

Hi,

I'm trying to use the Intel Media SDK to create a H264 stream and pack it into RTP and playback with gstreamer, so far so good, and a lot of answers on this forum have helped me, currently I'm running into issues with understanding if the bitstream is valid in terms of the NAL units present and I need some clearance on this topic.

I have the following questions:

1) The Annex B format is the only bitstream format produced by Intel Media SDK right? It is just a little inconvenient to run through the bitstream looking for 0x00 0x00 0x00 0x01.

2) How to disable NAL's with type 9 (access unit delimiter unit)? As far as I understand I don't need this in the RTP stream, at least I haven't seen real life H264 over RTP streams sending them. I tried passing mfxExtCodingOption structure like this:

mfxExtCodingOption EncodingOptions;
memset(&EncodingOptions, 0, sizeof(EncodingOptions))
EncodingOptions.Header.BufferId = MFX_EXTBUFF_CODING_OPTION;
EncodingOptions.Header.BufferSz = sizeof(EncodingOptions);
EncodingOptions.AUDelimiter = MFX_CODINGOPTION_OFF;

but the encoder initialization always returns that this is unsupported.

3) During encdoing i get onl sps pps and sei units, is that normal? Shouldn't i get some coded slice frames? I've done a little a output from my program that list the frame flags and found NAL units in the bitstream. Can you tell me if this is expected behaivor? If so maybe there is an issue in gstreamer, because it seems to be throwing a lot of errors.

[6.5.2014 23:2:52.319]: Bitstream generated, frame type: MFX_FRAMETYPE_I|MFX_FRAMETYPE_REF|MFX_FRAMETYPE_IDR
[6.5.2014 23:2:52.320]: NAL unit with, type: 9, size: 2
[6.5.2014 23:2:52.322]: NAL unit with, type: 7, size: 39
[6.5.2014 23:2:52.323]: NAL unit with, type: 8, size: 4
[6.5.2014 23:2:52.324]: NAL unit with, type: 6, size: 3926
...
[6.5.2014 23:2:52.443]: Bitstream generated, frame type: MFX_FRAMETYPE_P|MFX_FRAMETYPE_REF
[6.5.2014 23:2:52.446]: NAL unit with, type: 9, size: 2
[6.5.2014 23:2:52.449]: NAL unit with, type: 8, size: 4
[6.5.2014 23:2:52.451]: NAL unit with, type: 6, size: 23
...
[6.5.2014 23:2:52.913]: Bitstream generated, frame type: MFX_FRAMETYPE_P|MFX_FRAMETYPE_REF
[6.5.2014 23:2:52.916]: NAL unit with, type: 9, size: 2
[6.5.2014 23:2:52.919]: NAL unit with, type: 8, size: 4
[6.5.2014 23:2:52.925]: NAL unit with, type: 6, size: 303
...
[6.5.2014 23:2:52.976]: Bitstream generated, frame type: MFX_FRAMETYPE_P|MFX_FRAMETYPE_REF
[6.5.2014 23:2:52.980]: NAL unit with, type: 9, size: 2
[6.5.2014 23:2:52.983]: NAL unit with, type: 8, size: 4
[6.5.2014 23:2:52.986]: NAL unit with, type: 6, size: 122
...
[6.5.2014 23:2:53.4]: Bitstream generated, frame type: MFX_FRAMETYPE_P|MFX_FRAMETYPE_REF
[6.5.2014 23:2:53.6]: NAL unit with, type: 9, size: 2
[6.5.2014 23:2:53.9]: NAL unit with, type: 8, size: 4
[6.5.2014 23:2:53.10]: NAL unit with, type: 6, size: 94
...
[6.5.2014 23:2:53.48]: Bitstream generated, frame type: MFX_FRAMETYPE_P|MFX_FRAMETYPE_REF
[6.5.2014 23:2:53.48]: NAL unit with, type: 9, size: 2
[6.5.2014 23:2:53.50]: NAL unit with, type: 8, size: 4
[6.5.2014 23:2:53.52]: NAL unit with, type: 6, size: 118
...
[6.5.2014 23:2:53.90]: Bitstream generated, frame type: MFX_FRAMETYPE_P|MFX_FRAMETYPE_REF
[6.5.2014 23:2:53.92]: NAL unit with, type: 9, size: 2
[6.5.2014 23:2:53.93]: NAL unit with, type: 8, size: 4
[6.5.2014 23:2:53.95]: NAL unit with, type: 6, size: 102
...
[6.5.2014 23:2:53.152]: Bitstream generated, frame type: MFX_FRAMETYPE_P|MFX_FRAMETYPE_REF
[6.5.2014 23:2:53.158]: NAL unit with, type: 9, size: 2
[6.5.2014 23:2:53.162]: NAL unit with, type: 8, size: 4
[6.5.2014 23:2:53.165]: NAL unit with, type: 6, size: 144
...
[6.5.2014 23:2:53.187]: Bitstream generated, frame type: MFX_FRAMETYPE_P|MFX_FRAMETYPE_REF
[6.5.2014 23:2:53.191]: NAL unit with, type: 9, size: 2
[6.5.2014 23:2:53.195]: NAL unit with, type: 8, size: 4
[6.5.2014 23:2:53.198]: NAL unit with, type: 6, size: 101

 

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Now that I look at the memory of the bitstream I also see values like 00 00 01 25 in the memory, does the encoder use both 00 00 01 and 00 00 00 01 as the prefixes for the NAL units?

Just to clarify, so far I was looking only for the prefix 00 00 00 01, not 00 00 01. Looking at how Google Chrome creates the Annex B bitstream from H264 (http://src.chromium.org/svn/branches/1312/src/media/filters/h264_to_annex_b_bitstream_converter.cc) I understand that the PPS/SPS/SEI/AUD always have an extra zero byte, so I should bee looking for both prefixes right?

Best Reply

Look h264 standard: http://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.264-201304-S!!P...

You can find almost all answers there.

 

Yeah thanks, I found everything I needed in the spec. Just to clarify to first question - are there any plans to support other bitstream formats than Annex B?

Media SDK h264 encode is annex B only.  No plans to support other formats.

 

Quote:

Jeffrey Mcallister (Intel) wrote:

Media SDK h264 encode is annex B only.  No plans to support other formats.

 

Thanks for the reply. I was asking this only since at least in my environment the encoder puts SPS, PPS together with I and P frames so when performing RTP packetization I need to run through the buffer looking for start codes so split the bitstream into NAL units. Is there any way to know (that I am not aware of) how to know how many NAL units are put in a given bitstream returned in an mfxBitstream? 

It looks we are doing the similar work! Recently I'm analyzing the bit stream as well.

1) As described in standard ISO/IEC 14496-10, we should look for the start code of NALU by checking if NextBits(24) equals to 0x000001. So I'm doing this by first checking if current byte equals 0x00. It saves some time.

2) In my experiment, if the CPU support encoding hardware acceleration, the delimiter nalu (type == 9) would not present. They were only found when I use the software encoding method.

3) When scanning the encoded stream output by the video conferencing sample, I got the following result. Note that I enabled SVC temporal scale feature.

Processing started
Frame    0, type=I, latency=16.52 ms, parse= 0.12 ms, length= 17482 B, nal[0](0,Delim), nal[1](1,SPS), nal[2](1,PPS), nal[3](0,SEI), nal[4](1,SVCPre){prid=0, tid=0}, nal[5](1,I)
Frame number: 1

Frame    1, type=P, latency=10.92 ms, parse= 0.03 ms, length=  4645 B, nal[0](0,Delim), nal[1](0,SEI), nal[2](0,SVCPre){prid=3, tid=3}, nal[3](0,P)
Frame    2, type=P, latency=15.59 ms, parse= 0.04 ms, length=  6004 B, nal[0](0,Delim), nal[1](0,SEI), nal[2](1,SVCPre){prid=2, tid=2}, nal[3](1,P)
Frame    3, type=P, latency=14.24 ms, parse= 0.03 ms, length=  4954 B, nal[0](0,Delim), nal[1](0,SEI), nal[2](0,SVCPre){prid=3, tid=3}, nal[3](0,P)
Frame    4, type=P, latency=15.87 ms, parse= 0.04 ms, length=  6900 B, nal[0](0,Delim), nal[1](0,SEI), nal[2](1,SVCPre){prid=1, tid=1}, nal[3](1,P)
Frame    5, type=P, latency=14.64 ms, parse= 0.03 ms, length=  4055 B, nal[0](0,Delim), nal[1](0,SEI), nal[2](0,SVCPre){prid=3, tid=3}, nal[3](0,P)
Frame    6, type=P, latency=13.72 ms, parse= 0.03 ms, length=  4120 B, nal[0](0,Delim), nal[1](0,SEI), nal[2](1,SVCPre){prid=2, tid=2}, nal[3](1,P)
Frame    7, type=P, latency=13.98 ms, parse= 0.03 ms, length=  3715 B, nal[0](0,Delim), nal[1](0,SEI), nal[2](0,SVCPre){prid=3, tid=3}, nal[3](0,P)
Frame    8, type=P, latency=12.87 ms, parse= 0.04 ms, length=  6571 B, nal[0](0,Delim), nal[1](0,SEI), nal[2](1,SVCPre){prid=0, tid=0}, nal[3](1,P)
......

This is a piece of result output by the sw encoding process. I found that the last nalu (nal[5] in I frame and nal[3] in P frame) is the coded slice. And if run the demo on PC support hw, the delimiter nalu (nal[0]) won't present.

Leave a Comment

Please sign in to add a comment. Not a member? Join today