DecodeFrameAsync returning stagnate timestamps for certain files

DecodeFrameAsync returning stagnate timestamps for certain files

These flash files:
http://blog.iamjvn.com/2011/03/san-antonio-2011-video.html#more

Exhibit stagnate timestamp behavior when calling DecodeFrameAsync(). I am using 2.0.12.24071 (2.0 gold)

The test case looks like this:
sts = m_pmfxDEC->DecodeFrameAsync(&m_mfxBS, &(m_pmfxSurfaces[nIndex]), &pmfxOutSurface, &syncp);

.
. (same is in pipeline sample code)
.

if (MFX_ERR_NONE == sts)
...
sts=WriteFrame(pmfxOutSurface);

What happens is thatprior toa *MFX_WRN_DEVICE_BUSY interval in the m_mfxBS structure the DataLength queues up about 4 frames, when this happensand as the loop starts to process these... the pmfxOutSurface->Data.TimeStamp will have the last pts for all 4 frames... so for example:

if the m_pmfxSurfaces[] should have had
[10] = 936
[11] = 969
[12] = 1003
[13] = 1036
[14] = 1070
[15] = 1103

This is what I get
[10] = 936 (good)
[11] = 1070 (bad)
[12] = 1070
[13] = 1070
[14] = 1070
[15] = 1103

Based from this forum entry
http://software.intel.com/en-us/forums/showthread.php?t=70446

I'd expect mpfxOutSurface->Data.TimeStamp to work properly, and the code in here is exact to the example. Is there something that I am missing? Are these files correct? They work with all other decoders including ffmpeg. Is this PTS duplication expected behavior? Is there a way to make this work? I sure would hate to resolve to hacking a solution! I hope to hear a reply. I hope Nina gets this message.

*MFX_WRN_DEVICE_BUSY- The way I determined this to be true is that I can manipulate the time by injecting a sleep... the actual queuing of the frames may take place in a different frame range interval, and the same set of frames that I test (over and over again)succeed just fine.

40 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Hi James,
Could you try processing the MFX_WRN_DEVICE_BUSY status by simply waiting and repeating the same call of DecodeFrameAsync (without making changes to the input bitstream)? In your case you seem to read more data into the input bitstream and increase the bitstream.TimeStamp each time.
If you operate with timestamps you should not store more than 1 frame in the input bitstream becausempfxOutSurface->Data.TimeStamp is determined based onm_mfxBS.TimeStamp.
Please let us know wether this helps or not.
Regards,Nina

I've attached a code snip of the current function as it is virtually identical to the pipeline sample code as it (to the best of my knowledge) upon receiving MFX_WRN_DEVICE_BUSY will wait and repeat the same call of DecodeFrameAsync without making changes to the input bitstream. It should be noted that the accumulation within the bitstream happens BEFORE I saw this message, and just right before (as this was consistent).

I never store more than 1 frame into the input bitstream, but rather this happens implicitly because the frame will not process until it gets the 3 or 4 future frames accumulated into the bitstream. This "implicit" functionality is identical to the CSmplBitstreamReader::ReadNextFrame(), where the DecodeFrameAsync() manages the DataLength, and DataOffset accordingly.

The key here is why does it not always transfer the input bitstream packet to DataOffset consistently when I call DecodeFrameAsync, is this expected behavior?It willskip thistransfer and instead add toDataLengthabout every 10 frames or so and not necessarily in the same group of frames. The rest of what you said in regards to one frame / timestampin bitstream per successful DecodeFrameAsync()frame creationmakes sense as I now can see how to safely create a work-around.

For now I propose to make a work-around by assigning the *correct* time-stampprior to the DecodeFrameAsync call. I can determine this by keeping a count of how many frames have accumulated in the bitstream per frame session. This probably is not the cleanest solution but should be effective.

Thanks for telling me when the timestamp gets assigned to the mpfxOutSurface->Data. I'll post back if this works. Let me know if you have any insight on why theinput bitstreamframes accumulate like this.

-James

Anexos: 

AnexoTamanho
Download DecodeFrame.cpp2.08 KB

Here is a solution that is working against all my test media files. This should illustrate the problem, but also it would be good to review as I am making some assumptions on the behavior of the inputbitstream. The idea here is when the input bitstream queue's up, so does the timestamps. Ideally, it would be great if the SDK could manage this queue for me... assuming this can be expected behavior.

-James

Anexos: 

AnexoTamanho
Download FrameDecode_rev2.cpp2.78 KB

Hi James,
I've studied the code you provided and I saw that processing of DEVICE_BUSY status is not the reason of the problem as I thought before.
I'm really puzzled with the issue you observe, it doesn't look as correct behaviour. The workaound is fine but it essentially means bypassing of Media SDK in terms of timestamps setting. And as you say the SDK must be able to handle timestamps properly - it is designed to do so.Does the AVFrame *picture constain whole frame each time?Could you debug to see which status is returned from DecodeFrameAsync when bitstream accumulation happens? Decoder would return MORE_DATA if there's not enough data to decode a frame. And in any case the data shouldn't accumulate in the bitstream, decoder can leave only a few irrelevant bytes.Below is the description of the expected decoder behaviour:"The input bitstream bs can be of any size. If there are not enough bits to decode a frame, the function returns MFX_ERR_MORE_DATA, and consumes all input bits except if a partial start code or sequence header is at the end of the buffer. In this case, the function leaves the last few bytes in the bitstream buffer. If there is more incoming bitstream, the application should append the incoming bitstream to the bitstream buffer. Otherwise, the application should ignore the remaining bytes in the bitstream buffer and apply the end of stream procedure described below.If more than one frame is in the bitstream buffer, the function decodes until the buffer is consumed. The decoding process can be interrupted for events such as if the decoder needs additional working buffers, is readying a frame for retrieval, or encountering a new header. In these cases, the function returns appropriate status code and moves the bitstream pointer to the remaining data.It is recommended that the application invoke the function repeatedly until the function returns MFX_ERR_MORE_DATA, before appending any more data to the bitstream buffer."It would be great if you could model your application behaviour with sample_decode or maybe DirectShow filters. I need a reproducer to understand what exactly is happenning. Could you please try?Btw, you shouldn't alter working surface timestamps if the surface is locked - and likely it would be locked after DecodeFrameAsync code. But this might be not relevant to the problem - just for your information.Best regards,Nina

"
Does the AVFrame *picture constain whole frame each time?
"
Yes, in fact one of the tests that I have done is physically copy the bitstreammemory of a time it *fails against the time it succeeded to verify they were identical to exonerate the FLV muxer.

*fails - meaning it accumulated when it should have consumed it.

"
Could you debug to see which status is returned from DecodeFrameAsync when bitstream accumulation happens?
"
It always returns MFX_ERR_NONE for all 4 frames(just verified as I type this)

"
And in any case the data shouldn't accumulate in the bitstream, decoder can leave only a few irrelevant bytes
"
For now of all the files I have tested (e.g. other flash files, mp4, m2t) I have not seen this problem there is something about these group of files (see initial post with link)

"
It would be great if you could model your application behaviour with sample_decode or maybe DirectShow filters. I need a reproducer to understand what exactly is happenning. Could you please try?
"

I have modeled the application as close to the sample_decode as much aspossible. If you need help in getting together some code to reproduce let me know. I am not using any direct-show code.

"
Btw, you shouldn't alter working surface timestamps if the surface is locked - and likely it would be locked after DecodeFrameAsync code. But this might be not relevant to the problem - just for your information.
"

I presume you saw this comment:
//This serves no purpose but is very useful for debugging

I have commented this out by default inmy currentrunning build... I just used it temporarily to help read what was going on during this issue.

-James

I got to thinking that there may be some other variables involved which may make it harder to reproduce on your end (e.g. subtle difference in avcc->annex-b conversion). As a fall-back I could submit a raw element dump in annex-b form which should be reproduceable with the sample_decode executablewith the exception that the read next frame may not advance in frame packets (this may be necessary). Let me know if we need to go that route. basically for me the ffmpeg flv demux will present packets (i.e. the entire frame) the audio is pruned out to ACC codec, and the video first gets converted to annex-b and then submitted to the DecodeFrameAsync(). The conversion to annex-b is straight forward as it simply converts the entire frame. The logic for appending the frames is identical to the sample_decode in regards to interpreting the DataLength and DataOffset.

Hi James!
I think I now know what is wrong - you see, according to the specification (and Media SDK developers confirmed), the situation when decoder returns MFX_ERR_NONE and doesn't consume the input bits is totally legal. That's why the manual "recommends" to append data to bitstream only when decoder explicitly requests MFX_ERR_MORE_DATA. This recommendation is crucial for the application which deal with timestamps.And that's exactly your case. Could you try invoking your "ReadNextFrame" only if MFX_ERR_MORE_DATA is returned? This should fix the problem.
On the other point, I would say that this behavior is not really well described in the Media SDK manual so I will work with the team to improve the documentation.
Thank you for tracking down this detail.
Best regards,Nina

Yes this explains the issue and I have "partially" confirmed that it works. I say partially as my workflow iscurrentlyincompatibleto this suggested change given itsdemux environment. My actual solution would involve either queueing timestamps (as I currently do)or queueing extended locked surfaces. I agree with the suggested action item listed above. Thanks for your help on this matter.

I needed a little bit of time to think this through. I'd like to start out by saying while the time stamp solution I presented earlier does work, it is a bit fragile and I would like to end this discussion with something a a bit more solid and robust.

Also I want to present a birds eye view of my work flow here in hopes that it gives perspective on how I ran into this issue.

The current workflow in a simple model is the case where I simply wish to obtain the next frame sequentially as if I were to play the video:

UncompressedFrame=ReadNextFrame()

Inside this function it looks something like this:

while (!UncompressedFrame)
UncompressedFrame=decode (next compressed frame);

This is a real over-simplified model that is only somewhat accurate, where it always gives a compressed frame to get an uncompressed frame. With some mpeg2 type codecs this model could work around p and b frames by keeping a lean queue internally.

As we see with this model using the intel codec there are some times when we need some form ofinput queue controluntil the codec says it is ready for more compressed frames.

I propose the solution be a "smart" input bit stream that can manage this for me. I'dalso attempt toencapsulate the data length and data offset, wherethe client codewouldn't need to manage them.The interfacewould be as simple as adding "packets" into it, and it work with when to submit to the current working bitstream as well as the time stamps that they corresponded to.

If this solution seems to be in the right direction, and if others could benefit from this let me know, and I'd be happy to submit it.

-James

Hi James,
I think your solution would be a great value to our Media SDK developer community. I would appreciate if you post it here on the forum.
Thank you so much for your contribution!
Nina

Ok will do, but at the moment I am going to have to put this on the back burner as some new h264stress media has come to me which challenges this codec. I'm not going to go into details about it here except to say keep an eye out for me during this week (I'll want to confirm exactly where the problem is before I post). Once I get these resolved (hopefully) I can finish this.

Attached is a class called BitstreamManager()

It does as I had hoped where the client code need not mess with the internal fields of the bitstream.

Please code review, and let me know if there are any proposed changes.
Thanks.

Anexos: 

AnexoTamanho
Download BitstreamManager.cpp4.19 KB

Hi James,
I reviewed the code and have a comment - in AddToBitstream function you check form_mfxBS.DataLength==0 as an indicator of decoder having taken the data from bitstream (only then you add new data). This condition is not quite correct as decoder may take the frame data but leave a few bytes in the bitstream buffer (partial startcode). I would recommend to rely on function return statuses (MFX_ERR_MORE_DATA) - according to spec - rather than bitstream sructure fields value.
Regards,Nina

Yes, thatline was making an assumption and thanks for clearing this up with me... so in regards to the "partial startcode" casecould it ever happen if the bitstream submitted to DecodeFrameAsync() always ended on frame bounderies?

I have one other question
"
may take the frame data but leave a few bytes in the bitstream
"

When this happens will the mfxBS.DataLength = 0?

Hi James,
I made a small investigation regarding your questions. In the default mode, even if you submit frame-wise bitstreams, decoder may leave several bytes which look like a part of start code (even if they do not actually belong to a startcode). But there is another mode, which is actulally preferred from the performance standpoint, when you explicilty inform decoder that you will feed whole frames - for that you need to set the flag on the bitstream DataFlag = MFX_BITSTREAM_COMPLETE_FRAME.
Regards,Nina

If decoder leaves few bytes in the bitstream DataLength will not be 0 but will be equal to that "few" value.

Thanks so much formentioning MFX_BITSTREAM_COMPLETE_FRAME mode. I have had a chance to test it, and formy MP4, flash clips collection where we convert to annexb. This was successful, unfortunately clips from Cannon Vixia camera's fail. :(Cannon Vixiaclips are natively annex-b and self embed the sps pps within the frames. When these fail they show a primary blank grey canvas with several macro blocks of real video flashing. It would be nice to know if anyone else can reproduce this with these files. Finally just to be clear, cannon vixia files work fine if I do not use this flag.

Let's talk a minute about the case where a frame gets consumed and leaves several bytes that look like a part of a start code. If I loop again andcall DecodeFrameAsync with just this... is it safe to say that it should have yielded a return MFX_ERR_MORE_DATA? And would it still keep this memory in the bitsteam? The follow up to this would be what would be the consequence ofsubmitting these bytes, with the next frame appendedand calling DecodeFrameAsync()? I'd like to find such a file and test this. The reason why I have that logic in there is that every file I have (except for the flash files in this case)... consume on the first call to DecodeFrameAsync()... this means I save an extra memcopy of the input stream for most of the clips I have tested. Yes, now days it's probably not a significant performance gain, but its the idea of saving extra work that has me fighting to keep it in our code. ;)

Hi James,
I think the problem with Cannon Vixia files you mention is exactly in SPS/PPS headers. If you set the flag COMPLETE_FRAME you need to feed only frame data (or full startcodes and full sps/pps). Those few bytes that decoder may leave in regular mode can be not only the partial startcode but also a partial sequence header. If you set the flag and feed data with header decoder seems to simply use parts of headers for decoding.
If you loop again after bytes are left (decoder already returns MORE_DATA when leaving those bytes) decoder will return the MORE_DATA again. You should append new data to the bitstream, those few bytes will get consumed only with the new portion of data. The main idea is to not break data continuity, which means you should not remove any bytes from bitstream.
At this stage my general recommendation will be to rely on spec and function return statuses rather than investigating these complicated details. You may still miss some corner cases. Programming based on return statuses is way simpler and more reliable.
Nina

Thanks for the quick turn-around reply :)

I agree with your standing on sticking with the spec, as I'd probably do the same if I were you, and therefore agree this aspect of the case is closed. The questions you have answered for me have been of great value to me and our company, and I'll take responsibility for any corner cases that may crop up.

I am concerned about the Cannon Vixia, as I have spent the past hour verifying exactly what I have submitted to DecodeFrameAsync (i.e. full complete frames). Let me know if we should open a separate case for this.

Hi James,
Ok, deal :)
As for Canon Vixia streams, could you share one with me - I would try it on our internal test application which supports COMPLETE_FRAME functionality.
I suggest to go on with discussion in this thread as we already have a valuable piece of history here.
Nina

Here we have a recorded clip from the cannon vixia hf200 camera. I changed the extension from MTS to m2t, but everything else is left intact. Let me know if you can reproduce any symptoms using the COMPLETE_FRAME flag. Thanks.

Anexos: 

AnexoTamanho
Download 00002.m2t32.15 MB

Hi James!
Our internal app decodes this stream fine both with and without COMPLETE_FRAME flag. Do you think you could try form a reproducer for me based on sample decode + your specific bitstream reading algo?
Nina

Thanks for testing this. Knowing that it is supposed to work gives me direction to pursue why it does not in our code. I'll try to work this outandalso consider submission here if I cannot figure it out. I'll let you know what I find.

Ok I know what it is... this file is fielded and I needed to submit a pair of fields when calling DecodeFrameAsync(). (I like this simple problems) :)

Great to hear you solved it! :) I will remeber to suggest to check this point if anyone has similar problem.Thanks for telling me.
Nina

If it is easy to add... it would be cool to have a COMPLETE_FIELD flag, or make it support complete fields. For now I'm only going to apply the optimization for progressive formats.

Ok, understand your point, it's a bit complicated to deal with fields. I will pass this request to the architecture team, we'll see what can be done.

"
Our internal app decodes this stream fine both with and without COMPLETE_FRAME flag. Do you think you could try form a reproducer for me based on sample decode + your specific bitstream reading algo?
"

When you tested this clip, was the machine testing with the sandy bridge hardware enabled? Today I have been doing some preliminary testing with an Intel Sandy Bridge beta unit. I have found all of my mp4 clips (fielded or not) to work fine, but no luck with Canon Vixia hf200clips (like the one you tested here). It works if I use MFX_IMPL_SOFTWARE, but fails using MFX_IMPL_AUTO_ANY. The way it fails is that on the Decode_Frame_Async calls it *always* returns MXF_ERR_MORE_DATA.

Any test results on this clip using the hardware will be very helpful, and we can go from there.
Meanwhile, I'm going to conduct more tests to see if I can make sense of what is happening.

Hi James,
I tested the software library that time. Will try on SandyBridge and let you know. By the way, what is the graphics driver version you have on your system?
Nina

Checked decoding of your clip using COMPLETE_FRAME path on SandyBridge system with driver15.21.13.2342 - works fine.Nina

Currently, I have tested with 15.21.64.2219. I'll get the latest drivers and test again.... Thanks for letting me know what version has passed for you. I'll let you know once this test has finished...

I can only access version 15.21.64.2219 using the premier or support sites. Can I either test 15.21.13.2342, or can you test with 15.21.64.2219? In the mean time, I'll run a few other tests tomorrow, and seehow the sample code behaves.

Thanks.

Hi James,You may try to install the 2342 driver fromhere(64-bit version).

Thanks for this link... I have this installed and can use 32 bit builds for testing the older version. It appears both versions work fine with these files, and that there is something happening within my code to cause it to fail. Since the sample decode is working for me I should be able to work it out. If the findings are good for the group I'll post them. Thanks again.

I have worked out what the problem was and there is something worth noting here about it... When testing this file using software it will consume 98,195 bytes on the initial pass, where as in hardware it will consume 273,986 bytes on the first pass. In my code Isetup a timeout for up to 25 packets, and I needed 31 packets for initial startup. This is an easy fix, but it is surprising how the hardware requires more frames to initialize.

imagem de admin

Hi James, good to hear you found the reason. The difference you observe is totally normal - sw and hw decoders just buffer different number of frames. The advice would be not to rely on fixed numbers - it depends on the dpb size and number of threads - so may vary.
Nina

This number is a detail of the implementation and may change in future Media SDK implementations/with future HW platforms. While API - the return codes etc. - will remain backward compatible through all the versions. Just another warning :)

Hi ,

    I am also facing similar issue while decoding an AVC stream.Please find attached file (modified code from simple_decode.cpp) for your reference.
Basically i am trying to get SEI messages from type 4 , which is closed caption data . And also i am trying to get Time stamps along with this message. In this particular code , i am also trying to get POC , which i am not getting it.

     I am not getting all above three parameters, starting from different SEI message and Time stamp always showing value as "0".  POC is another parameter that i am looking for.Please let me know if i have done anything wrong here. Your blog did not gave answer, hence i need your help in figuring out. Please help me.

Regards
Vinay 

Anexos: 

AnexoTamanho
Download simple_decode.cpp11.54 KB

Faça login para deixar um comentário.