How to reinterpret a __m128 value as __m256?

How to reinterpret a __m128 value as __m256?


I'm looking for a way to do what _mm256_cast??128_??256 does, just without the part where it says "the upper 128 bits are undefined". What I do is that I execute some VEX coded SSE instruction, which results in the lower 128 bits to store the result and the upper 128 bits to be zeroed. Now, I want to continue to use this register for an AVX intrinsic, or just store the whole 256 bits to memory. With the currently available intrinsics I see no other safe way other than to use

_mm256_insertf128_??(_mm256_cast??128_??256(x), _mm_setzero_??(), 1)
, which is major overkill for something that, in reality, doesn't need any extra instructions.

From my tests, the cast intrinsic does what I want when I use clang, GCC, or ICC. But MSVC prefers to do the cast via 128bit store + 256bit load (stupid compiler). And even if I had luck with also MSVC, I'd rather not depend on undefined behavior. Do you have any idea how to do this? If you have a compiler-specific solution that would also be interesting. E.g. an inline asm statement that would do what I need...

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

>>...But MSVC...stupid compiler...

Sorry, but I think such statements do not look good.

Yes, sorry. I'm rather frustrated by all the quirks, bugs and missing optimizations in MSVC...

Leave a Comment

Please sign in to add a comment. Not a member? Join today