# Absolute timings

## Absolute timings

Hi.  I'm not sure about the term (absolute) (I'm far from beeing fluent in English) I'm trying to optimize some GPU code, using GPA to test the results. Especially on a subset of the whole frame : [A] 1 quad (simple VS using VertexID to generate 3 vertices to draw a quad [using Scissor to render only the quad]) with some computations in PS (1 RT) [B] 1 rendering (dummy VS, complex GS [using PrimitiveID as input] reading the previous quad, simple PS rendering the generated triangles) I was surprised that each time I benchmark the code, GPA reported noticable timings differences. So I recorded a frame, and simply load it many times in GPA (always the same capture file) It seems that GPA gives 3 sets of values : [A]  GPU : 388 ( PS = 377 ) GPU : 345 ( PS = 335 ) GPU : 295 ( PS = 287 )

[B] GPU : 12122 ( GS = 7960 / PS = 4080 ) GPU : 10777 ( GS = 7070 / PS = 3626 ) GPU :  9195 ( GS = 6034 / PS = 3094 ) trying 10 reloads of the same capture file always switch to one of this case (with some decimal diff, but globally always in this range of 3 cases) I understand for the small variation, it's logical (and not a problem anyway) but the big variations may be up to 30%, with is quite annoying (when you're trying to benchmark, it's a bit worrying to have to test several times the same thing to figure a medium value). Moreover, I use a FPS counter in my application (very basic, giving the time spend for each frame). It has some small variations, but let's say far less than 5% (I never see a static scene taking 30 ms to render, then 40 ms the next frame) My test (at home) computer is a Sandy bridge (i3-2330M) laptop. Maybe it's some kind of power/frequency control issue ? [in this case would it be possible to ensure that while running GPA analysis the GPU runs at full speed] Is there any way to use a work-around ? Thanks in advance :) [I can email you the capture file if you want it]

5 posts / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.

by the way, I upgraded to the latest drivers (8.15.10.2761) [from the previous ones] and there are 1 2 strange things [the latest driver is available in the "Processor Graphics" section, not in the "Laptop graphic drivers", that's why I wasn't up to date] running the SAME code ... is slower why the new drivers best case     was GPU :  9195 ( GS = 6034 / PS = 3094 ) now is GPU : 10233 ( GS = 6713 / PS = 3444 ) medium case    was GPU : 10777 ( GS = 7070 / PS = 3626 ) now is GPU : 11279 ( GS = 6784 / PS = 3474 ) worst case    was GPU : 12122 ( GS = 7960 / PS = 4080 ) now is GPU : 12633 ( GS = 7963 / PS = 4086 ) (worst case beeing similar, I assume that the driver's shader compiler [HLSL asm => GPU code] is not different ?)

Hello,

Thanks for pointing this out... let me see what I can do to help with this.

First of all, what version of Intel GPA are you using? Please be sure that you are using the R4 release (came out about a week ago), as we have made some changes in the measurement code (but I'm not sure that your specific testcase would see any changes, but let's be sure).

Secondly, Intel GPA already makes multiple passes when it calculates the metrics values (this is why you'll see the "variance", so be sure to set "show metrics values range"). So hopefully we already help with this, so as you mentioned a large variance seems unusual.

Thirdly, let me check on the GPU frequency variance due to CPU/GPU power tradeoffs -- we've had some discussion on this over time, and I want to be sure I have the latest information for you.

Regards,

Neal

Hello again,

I did find out that we do lock the gpu frequency at the max possible during playback to avoid potential issues as you discussed.

So the next thing is for me to get a copy of your capture file, and we can do some more analysis of this.

Regards,

Neal

Hello,

I've done some experiments myself with some ergs on a sample capture file, and I can't duplicate the variation that you're seeing. Are the other metrics showing similar variation? For example, if any of the memory metrics are showing significant variation this might explain some of the difference (that is, depends upon how you're hitting the cache, etc.).

Also, can you provide some information on how you got your numbers? That is, did you just select/deselect the ergs within the same run of Intel GPA, or did you completely quite out of the Intel GPA application and rerun it again?

If you can copy the "About..." configuration info here and the frame capture file I will have the development team take a look at it.

Regards,

Neal

ps-> If you want, I can send you a private email to use for getting me a copy of the file, or we can try the Intel ftp server (where only Intel personnel can read the file).