I'm looking for the smartest(=fastest) way to insert a DWORD into an AVX register.
Here is what I found so far:
AVX vinsertps doesn't work because it clears the upper 128bits and the immediate value can't address the upper 128bits anyway
AVX vpinsrd doesn't work for the same reason, and - truly sad unless the docs are wrong - hasn't been promoted in AVX2, even though the immediate value has space to encode where to insert also in 256bit vectors.
There are lots of multi-instruction workarounds I could think of, but I hoped that the Intel engineers have a smart trick for this basic operation which I overlooked?