New 24.20.100.6094 Win10 driver performance regression from .6025

New 24.20.100.6094 Win10 driver performance regression from .6025

My suite of kernels compiled (to binaries) with the .6094 driver on Win10/x64 take almost twice the amount of time to execute as those compiled with .6025.

Compiling on .6025 and executing on .6094 shows no regression.

Compiling on .6094 and executing on .6094 or .6025 shows the huge performance drop.

Inspection of the .6094 produced assembly shows long sequences of MOV operations that I believe are unnecessary. 

I wish there was a better way to report performance regressions (and reproducers) than here or the GitHub issues page (which is very quiet).

-ASM

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

The .6136 driver is also spilling a LOT of registers on kernels that compile without any spills on .6025.

I'm using __attribute__((intel_reqd_sub_group_size(8))) so there should be plenty of registers.

Today's new driver (.6194) exhibits the same regression. 

As noted above, .6025 works perfectly.

All kernels listed here are decorated with a reqd_subroup_size(8).

Let me know who I can send the kernels to.

Here is a link for Graphics Compiler project

https://github.com/intel/intel-graphics-compiler

Compiler development team is monitoring it, so publishing an issue there may help.

Done, thanks!

Hi AllanM,

Thanks for the detail. If your reproducer source is privileged, it can be submitted confidentially through the Intel Service Center. I can route it to the devs from there. For OpenCL, I recommend marking it as Media Server Studio/Media SDK related or Intel System Studio related.

https://software.intel.com/en-us/support/priority-support

 

Thank you

-MichaelC

Leave a Comment

Please sign in to add a comment. Not a member? Join today