I am developing a DirectX based medical imaging application that uses volume rendering. We used to require discrete video cards, but now we are tweaking/re-engineering it to work on recent intel processors. (HD Graphics as well as ivy/sandy bridge).
We don't appear to be CPU bound. From the GPA, I know that we spend about 1% of the time in the vertex shader. During continuous render, The pixel shader is about 50% utilized, and it appears to be stalled the other half. We are sampling volume textures A LOT. As the title says, the texture sampler is busy 95% of the time. I suspect this is due to memory latency, but I don't know how to confirm this. I did not find a counter that indicates how much the sampler is waiting for memory. There is a counter that indicates wether the sampler is "stalled", but that is near zero all the time.
So what would be the logical next step in the performance analysis? I would like to know if we are limited by the sampler, memory bandwidth, or both.
Thanks in Advance