What happens if DAZ bit is set but isn't supported?

What happens if DAZ bit is set but isn't supported?

Hello,

I've been profiling some SSE instructions on our target hardware, and have stumbled into the FTZ and DAZ flags.  Turning on the FTZ flag greatly increases speed, and turning on DAZ  increases it a bit more (for that first instruction that gets denormal input). 

This site is awesome, http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz, and it notes that the DAZ flag was not supported on earlier hardware.  There's even a link to a document that tells me how to check for DAZ support.  Because of curiosity, I have to ask the question: what happens if you try to set the DAZ bit on hardware that doesn't support it?  Did the MXCSR register change?  Was it an unused bit and setting it is just inaffective?

publicaciones de 3 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

I think I remember CPUs where it was possible to flip the DAZ bit with no effect.
According to my understanding, the Corei7-2 architecture is supposed to eliminate the effect of FTZ and DAZ settings on performance in the cases normally encountered.

Thanks for the info!

My core I7-2600 does handle denormals the same as normal floats for certain instructions. I don't have an extensive list of how they all perform, but I profiled pairs of addps and mulps instructions over 100,000,000 iterations. Here are my results, they're estimates in milliseconds:

addps
58.5 normals
58.5 denormals
58.5 FTZ+DAZ
58.5 DAZ
58.5 FTZ

mulps
59 normals
8050 denormals
59 FTZ+DAZ
59 DAZ
4120 FTZ

I can't complain about that, in fact I'm impressed that addps works just as fast with or without denormals. I was tipped off about the difference of denormal handling between certain instructions from research a man by the name of Bruce Dawson had done, http://www.altdevblogaday.com/2012/05/20/thats-not-normalthe-performance....

I've attached the code that is profiled, for anyone who is curious. Addps and Mulps are the important functions, the rest sets MXCSR with the right flags and copys normal/denormal into source.

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya