My suite of kernels compiled (to binaries) with the .6094 driver on Win10/x64 take almost twice the amount of time to execute as those compiled with .6025.
Compiling on .6025 and executing on .6094 shows no regression.
Compiling on .6094 and executing on .6094 or .6025 shows the huge performance drop.
Inspection of the .6094 produced assembly shows long sequences of MOV operations that I believe are unnecessary.
I wish there was a better way to report performance regressions (and reproducers) than here or the GitHub issues page (which is very quiet).