Using Xeon Phi intrinsics in C++, I would like to interleave the float values of 2 registers. It is basically a vector of structures (vector<complex<float> >) to structure of vectors thing. I guess it is somehow related to swizzle and shuffle, but looking at the compiler and instruction set manuals I dont see how to do it. Here are the "formal" specs.

- Given v1, v2 of type __m512, both containing 16 floats, transform v1=(x7 y7 ... x0 y0), v2=(x15 y15 ... x8 y8) into v1=(y15 y14 ... y1 y0) v2=(x15 x14 ... x1 x0)
- Given v1, v2 of type __m512, both containing 16 floats, transform v1=(y15 y14 ... y1 y0), v2=(x15 x14... x1 x0) into v1=(x7 y7... x0 y0), v2=(x15 y15... x8 y8) (basically the reverse operation of the first)

With SSE, I do it with _mm_shuffle_ps() and _mm_unpackhi/lo_ps(), but how to (efficiently) do it for Xeon Phi?

Georg