Intel Media SDK Tutorial - simple_4_vpp_resize_procamp

In this Intel® Media SDK tutorial sample we illustrate how to utilize Intel Media SDK to do frame processing on frame surfaces using the SDK's VPP component.

We start with the most simplistic Intel Media SDK VPP workload that uses system memory frame surfaces.

Click here to expand image in separate window

The snapshot above was captured using the Intel® GPA “Media Performance” dialog and shows the GPU performance while running the workload. Not surprisingly, the GPU is only utilized marginally. For tutorial snapshot benchmarks comparing all workloads analyzed with Intel GPA, navigate to this page.

Let’s explore what is causing such a low GPU utilization by investigating the workload trace below.

Click here to expand image in separate window

  • As with the previous workloads, it is apparent that the GPU is not highly utilized as indicated by the large gaps in the “GPU EU Queue” track. Note that the “GPU MFX Queue” is not visible as part of trace since VPP operations are exclusively processed on the GPU EU.
  • The trace shows the RunFrameVPPAsync() call returning immediately as expected, leading to the “VPP Submit” execution in the “simple_vpp.exe” track. RunFrameVPPAsync() is followed by SyncOperation() which, since the workload is implemented in a synchronous fashion, waits until the frame has been completely processed.
  • The performance impact of system memory to/from D3D memory copy is captured by the “VPP Submit” execution in the “simple_vpp.exe” track. From the sub traces in this track it is not completely clear what is going on. There is a big gap (see “A” in the trace above) before the “DXVA2_Execute” operation which submits the VPP task to the GPU. Even though this not explicitly explained in the GPA trace, the delay introduced here is due to the copy from system memory to D3D memory for the VPP input surface. The copy required for the generated VPP output surface is explicitly shown in the trace by the “FastCopySSE” operation.

To address the above performance issues let’s explore a modified VPP workload that uses D3D memory surfaces instead.

This tutorial sample is found in the tutorial samples package under the name "simple_4_vpp_resize_procamp". The code is extensively documented with inline comments detailing each step required to setup and execute the use case.

[ Previous Tutorial ]    [ Next Tutorial ]    [ Back to Tutorial samples index ]

如需更全面地了解编译器优化,请参阅优化注意事项