What happens if DAZ bit is set but isn't supported?

What happens if DAZ bit is set but isn't supported?

Hello,

I've been profiling some SSE instructions on our target hardware, and have stumbled into the FTZ and DAZ flags.  Turning on the FTZ flag greatly increases speed, and turning on DAZ  increases it a bit more (for that first instruction that gets denormal input). 

This site is awesome, http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz, and it notes that the DAZ flag was not supported on earlier hardware.  There's even a link to a document that tells me how to check for DAZ support.  Because of curiosity, I have to ask the question: what happens if you try to set the DAZ bit on hardware that doesn't support it?  Did the MXCSR register change?  Was it an unused bit and setting it is just inaffective?

3 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

I think I remember CPUs where it was possible to flip the DAZ bit with no effect.
According to my understanding, the Corei7-2 architecture is supposed to eliminate the effect of FTZ and DAZ settings on performance in the cases normally encountered.

Thanks for the info!

My core I7-2600 does handle denormals the same as normal floats for certain instructions. I don't have an extensive list of how they all perform, but I profiled pairs of addps and mulps instructions over 100,000,000 iterations. Here are my results, they're estimates in milliseconds:

addps
58.5 normals
58.5 denormals
58.5 FTZ+DAZ
58.5 DAZ
58.5 FTZ

mulps
59 normals
8050 denormals
59 FTZ+DAZ
59 DAZ
4120 FTZ

I can't complain about that, in fact I'm impressed that addps works just as fast with or without denormals. I was tipped off about the difference of denormal handling between certain instructions from research a man by the name of Bruce Dawson had done, http://www.altdevblogaday.com/2012/05/20/thats-not-normalthe-performance....

I've attached the code that is profiled, for anyone who is curious. Addps and Mulps are the important functions, the rest sets MXCSR with the right flags and copys normal/denormal into source.

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen