Encoding directly from video memory

Encoding directly from video memory

Portrait de Jack Chimasera

Hello

My current project requires me to compose images in GPU memory, and then encode them using QuickSync. I was wondering if, using MFX_IOPATTERN_IN_VIDEO_MEMORY i can avoid copying the images to system memory before encoding. if so, what command shall I use to copy the rendertarget I've composed into to the encoder's surface ? Will StretchRect do it ?

regards

Jack Chimasera

9 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de Petter Larsson (Intel)

Hi Jack,

that approach should work just fine.

Let us know if you encounter any issues.

Regards,
Petter

Portrait de Jack Chimasera

Hello peter
I will try this approach. Will StretchRect from an RGB32 rendertarget into QSV's NV12 surface perform the necessary colour conversion ?

regards

Jack

Portrait de Jack Chimasera

Hello Peter
Another question, if it's possible : Above I have assumed that the GPU will handle the RGB->YUV conversion necessary before the compression. As my application requires very high accuracy, is it possible to tell me what the conversion formula used is (I.E. what are the exact multipliers used by the GPU), and how exactly the 4:4:4->4:2:0 conversion is handled (I.E. where the particular U,V values are sampled from, within the 2x2 pixel range) ?

regards

Jack

Portrait de camkego

Jack,

I may be able to offer some helpful input here.

1. If you are composing the images/surfaces in GPU memory you may want to consider encoding straight from a Direct 3D surface.
see: "C:\Program Files\Intel\Media SDK 2012 R3\samples\sample_encode\readme-encode.rtf"

This may add a lot of work and complexity though, so it all depends on your performance needs, and how much
effort you can expend to get it working right.

2. 'sample_encode' from Intel only accepts NV12, and YUV420 video. If you are composing 444 RGB impages, they will need to be converted before calling Encodeframe.
Some options here are:
- use the VPP module as part of a pipeline to do RGBA->NV12
[I think it will work great on D3D surfaces] (see sample_vpp)
- call a routine from the Intel Performance Primitives lib before using a pipeline without VPP color conversion
[best on system memory only]
- do the color conversion by hand using your own RGBA->NV12 converter (this may be really slow accessing GPU memory)
[best for system memory only]
- use IDirect3DDevice9::StretchRect to do color conversion.
[I do not think this will work for you, as it appears YUV->RGBA only, not RGB->YUV, and without software emulation]

I hope this is helpful, please let us know what you get working.

Regards,
Cameron Elliott

Portrait de Jack Chimasera

Hello Cameron
Thank you for your input.
My most urgent need right now is for high performance, namely, I need to encode 60fps of 1920x1080 material, and if possible, two streams of that form in parallel on a modern IvyBridge CPU. I am prepared to put a great deal of effort into making this work properly. A previous implementation which reads back every frame to the CPU, the uses IPP to convert it to NV12 avhieved under 50% of the minimum necessary performance, on account of a slow readback by the GetRenderTargetData method of Direct3D9.
I have read "sample encode"s documentation and source, but its plainly visible there that the GPU-bound surfaces are being filled by locking them, and filling them from the CPU, while I need to fill them with data from a surface on the GPU as you have understood.
Regarding the options you have suggested :
VPP : I will check if VPP can perform RGBA->NV12 without CPU intervention. If it can, this just may be my ticket.
IPP : As mentioned above, IPP requires having the RGB32 frames in CPU accessible space, which is far too costly. The same goes for writing my own RGB32->NV12 routine.
StretchRect : I was unsure if this method supports RGB32->NV12 on intel's GPUs, which I why I have posted the query to begin with.
Thank you for the effort, I will check VPP.

regards

Jack Chimasera

Portrait de Jack Chimasera

VPP indeed appears to be the answer for RGB32 -> NV12 conversion. I just hope its input frames can be allocated in a way that will allow me to render into them using Direct3D.

Portrait de Petter Larsson (Intel)

Hi Jack,

not sure if you were able to progress on this topic. As you concluded Media SDK VPP is likely a good approach to take care of the RGB32->NV12 color conversion.

From your descriptions I now understand your environment better. In fact StretchRect is quite limited for DX9 and may not work for the purpose you need. We have found that for some situations a CPU assisted GPU->GPU surface copy may be required. Locking the surface and performing brute force row by row copy of an RGB surface can be a very large bottleneck. In that case please explore using efficient "fast copy" method such as described here:
http://software.intel.com/en-us/articles/copying-accelerated-video-decod...

Using such approach you will achieve much better performance vs. the brute force approach.

Regards,
Petter

Portrait de Jack Chimasera

Thank you, Petter !
I've used the code from the article you've attached (after changing the buffer size from 4K to 8K, due to a very bit render-target size at RGB32), and now I finally managed to do 1920x1080x60.
I still plan to try the VPP path, but it's a good deal less urgent now !

Connectez-vous pour laisser un commentaire.