When I started programming SSE, I was always wondering why there were operations that seemed to do the same thing, afterall both MOVAPS, MOVAPD, MOVDQA should result in the same thing, loading 128bits, right?
Then I found more detail about FTZ & DAZ, and realized that DAZis forcing to zero(I think) when loading data, or at least before operations (but which ones?), and I then realized how bad it would be to load integers using the float versions of the MOVs.
Now my problem: I have4 DWORDS (signed integers)at the bottom of 2 XMM registers, and I'd like to pack them into one. But I'm not seeing any quick way to do this, which is why I'm looking for details aboutDAZ (looks like it's very hard to find), to know if it applies to MOVLHPS or SHUFPS. Or should I use shift+OR?
Also, is there any other reason than DAZ to avoid mixing MOVs?
Finally, this is about the IPP libraries, is there a risk to flush denormals to zero when simply copying/moving blocks of memory using the IppsMove_32f or 64f versions? Should IppsMove_64s be preferred?
Note: this is in an audio sequencer so the DAZ flag is always forced on (when present).