I'd like to be able to decode a real-time video stream (currently I'm using a video file for testing) that is 1920X1080 progressive H.264 for video analysis. I have my custom video analysis DirectShow filter that take a few milliseconds per frame. Just using QuickSync to decode and VPP to convert to RGB32, and then store the frame takes from 8 to 10 milliseconds. After the analysis, writing to the frame, more stores of the modified frames into system memory,takes up another 5-8 msec (I'm running on a SandyBridge quad core I7 CPU. As a result I can't maintain the 60 FPS analysis (it is closer to about 30 frames, as there is also overhead for a splitter etc.) that the real time will require. Switching to a vanilla Haswell chip might gain 20%, but that would not be sufficient; I'm looking for a 100% or more speedup. Since a fair amount of time is taken up by the memory transfers/copies (i.e. I believe I'm primarily memory bound), it occurred to me that an Iris Graphics Pro Haswell chip with its much higher memory bandwidth (using the EDRAM) may be the solution. However, the QuickSync IOSurf recommends 32 surfaces, and that would not fit into the 128 Mbyte EDRAM.
Is there a way to get the decoder to use the EDRAM for as many surfaces that can fit in the 128Mbytes and the rest in system memory?? Would this speed up the Decode/VPP??
In general is there any documentation available as to how to target the EDRAM for use by QuickSync as well as general CPU programming. All I was able to find searching the Internet was a reference at the last IDF about using the Intel drivers and some vague comments about sharing it with the CPU. Surely, there must be some info I missed that explains how to do this in detail.
Any help would be appreciated.