We have now been using the Intel Media SDK for a number of years with great success. Recently however we have run into a rather major issue that we have been unable to resolve and would appreciate your feedback on.
In short, we push your platforms pretty hard, using significant CPU and memory bandwidth for real time video processing. We then use the Media SDK to perform hardware assisted H.264 encoding, however the moment that we start encoding a 1920x1080 interlaced video stream we see the performance of the entire system drop dramatically. All processes, even ones that are not using the Media SDK, seem to immediately start using almost twice the CPU usage. When we encode at lower resolutions (e.g.720x480) we do not see the same problem.
I have attached an example CPU plot that shows the problem; basically if we run our application as normal it uses some level of CPU. When we then start recording using the Intel Media SDK, even a single stream of video makes the entire CPU usage jump up dramatically; what cannot be seen on this plot is that the CPU usage of the Media SDK process is actually relatively low, but when it is running the "CPU time" spent in all the other processes increases a lot; presumable because the overall CPU memory bandwidth drops and so reads and writes tend to stall.
I have made a test application that demonstrates at least part of this problem. It basically works as follows :
1) It launched 8 threads doing memcpy's and measures the Mb/s copied on each one.
2) After 30seconds it will then start writing a 1080i M4V file to disk using the Intel Media SDK using hardware encoding.
What you see is basically the following :
Thread 1, memcpy = 1.47Gb/s
Thread 3, memcpy = 1.44Gb/s
Thread 2, memcpy = 1.43Gb/s
Thread 5, memcpy = 1.43Gb/s
Thread 0, memcpy = 1.43Gb/s
Thread 6, memcpy = 1.42Gb/s
Thread 7, memcpy = 1.36Gb/s
Thread 4, memcpy = 1.36Gb/s
[ snip ]
Starting to record to disk using Intel Media SDK ...
[ snip ]
Thread 1, memcpy = 0.99Gb/s
Thread 5, memcpy = 1.21Gb/s
Thread 0, memcpy = 0.95Gb/s
Thread 3, memcpy = 1.41Gb/s
Thread 7, memcpy = 1.07Gb/s
Thread 4, memcpy = 1.17Gb/s
Thread 2, memcpy = 1.11Gb/s
Thread 6, memcpy = 1.11Gb/s
In other words, the memory performance when using the Media SDK about 30%.
Our best guess is that memory bandwidth across the system drops dramatically when using the hardware encoder, and would appreciate any guidance that you might have in either diagnosing or avoiding this issue.
Cary Tetrick on behalf of Andrew Cross.