I already got some experience with SSE to AVX transition penalties and read the following article: http://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoidin...
There is written, only zeroall or zeroupper gets the cpu in the safe state where no penalties can occure.
Isn't this a problem in multithreading, multiprocessing? I mean, assume process A is running with SSE legacy code. For example normal floating point operations with scalar SSE code. And process B is using AVX and only at the end of function has a zeroupper.
What if context switch occurs in the middle of AVX code? The OS will switch context including YMM registers. But even if the upper are all zero, wouldn't the cpu remain in the other state? So context switches might lead to penalties for process A without any influene of the programmer. Or is there something I missunderstood?
This scenario just came to my mind and I don't know how one could solve this. Or is there a possibility for the OS to avoid this problem?