I have some SSE/AVX code that I'm trying to test with Intel Software Development Emulator (SDE) on CPUs without the native support for some of the instruction set extensions. In particular, I tried the following setups:
1. Sandy Bridge CPU, SDE is running with -hsw switch.
2. Sandy Bridge CPU, SDE is running with -hsw -sse-sde switches.
3. A KVM guest virtual machine with SSE4 instructions (host CPU is Nehalem), SDE is running with -hsw switch.
All this is on Linux x86_64, SDE 6.22 and 6.12.
What I'm seeing is when my code is running the emulated branch (i.e. AVX2 path or AVX path when AVX is emulated) I sometimes get corrupted results. The behavior is not stable, it can work correctly in one run and fail in the next one on the same input data. I'm sure my code is correct because I tried it on a Haswell machine and it works every time. Also, the AVX path is failing when emulated and not when executed natively. My code is single-threaded so there aren't any concurrency issues. I have not yet deduced which instructions have this behavior, the code is quite large.
Does anyone have these issues? Is there a workaround?