There is a known performance penalty for mixing AVX with legacy code on today's Haswell processors.
- Will this same problem exist in the Skylake chips and AVX512?
- Will the delay be even longer for the ZMMs, since there are twice as many to save & restore and they are twice as long?
- The workaround instruction for this, VZEROUPPER, is not listed as changed in Intel's Instruction Extensions manual. Won't there be changes like zeroing the high portion of the ZMM register?
This is documented by Intel: