How to make efficient use of async decoding?

How to make efficient use of async decoding?

I'm trying to understand if there's a more efficient of doing things than what I'm doing now. Basing myself on the decode samples, I have a single-threaded model like this:

  • Add data to the bitstream as needed
  • Call DecodeFrameAsync until it returns MFX_ERR_NONE
  • Call SyncOperation on the syncPoint returned by DecodeFrameAsync
  • Enqueue the frame for later use (same thread can used to take the frame and render)

It seems counter-productive to start an asynchronous operation only to wait for its completion immediately after.

So let's imagine a threaded model then:

Thread A (producer):

  • Add data to the bistream as needed
  • Call DecodeFrameAsync until it returns MFX_ERR_NONE
  • Enqueue the frame and its syncPoint
  • Continue while there are available surfaces

Thread B (consumer):

  • Dequeue the next frame and syncPoint
  • Call SyncOperation to obtain the data
  • Wait for more frames to enter the queue

Is this any more performant really? We're still calling SyncOperation the exact same number of times, so we're paying the same synchronisation costs. The only thing I can potentially see is if the queue is usually full and it is large enough, then there is a better chance that the decode will have finished by the time we call SyncOperation. But is this worth the extra complexity and inevitable contention on a queue? Nevermind some kind of polling mechanism for Thread B to detect when frames enter the queue, which isn't free either.

5 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

You've probably seen this already, but the Media SDK Tutorial has a section on efficient decode.  These don't include rendering.  I'm not aware of any data that has been collected on the efficiency of  rendering from the same thread or different threads, but based on other work with Media SDK the decision for how to implement could be informed by Occam's Razor -- the simplest application code often has the best performance.

I agree with your concerns about adding extra complexity.  It seems like multiple threads would add a lot of corner cases, development time, and maintenance costs with relatively minor theoretical benefit. The more complex code may be slower.

Regards, Jeff

引文:

Jeffrey Mcallister (Intel) 写道:
I agree with your concerns about adding extra complexity.  It seems like multiple threads would add a lot of corner cases, development time, and maintenance costs with relatively minor theoretical benefit. The more complex code may be slower.

Thanks for your input. My concern is that DecodeFrameAsync is an asynchronous method; by using it synchronously (calling SyncOperation immediately after), doesn't it defeat the purpose of having an asynchronous method in the first place? I mean, by its nature the method suggests you shouldn't synchronize immediately every time. Why is the method asynchronous if it's designed to be used synchronously?

Anyway, I guess I'll have to run precise benchmarks to get an idea of how much time is spent blocking on SyncOperation (which a 2-thread approach might drastically reduce).

Hi dr_asik,

Aysnchronous pipeline, even if designed to work in one thread (e.g. like in sample_encode), does give significant performance benefit. So if your app doesn't require each frame immediately after it's been decoded, then asynchronous pipeline is recommended.

Regards,

Nina

So I benchmarked stuff and the actual call to SyncOperation takes about half a millisecond on average. It's not exactly free but I have more pressing concerns about performance elsewhere in the app. Looks like I'll stick to the synchronous approach for now.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui