Here a solution using AVX2 (working on 32 bit integer entities):
#1:
org: ymm0 = x x x x a3 a2 a1 a0
vpermq ymm0,ymm0,0x10 => ymm0 = x x a3 a2 x x a1 a0 / select qwords x1x0
vpunpckldq ymm0,ymm0,ymm0 => ymm0 = a3 a3 a2 a2 a1 a1 a0 a0 / interlace low dwords
#2:
org: ymm0 = b3 a3 b2 a2 b1 a1 b0 a0
vpshufd ymm1,ymm0,0x08 => ymm1 = x x a3 a2 x x a1 a0 / select dwords xx20
vpshufd ymm2,ymm0,0x0d => ymm2 = x x b3 b2 x x b1 b0 / select dwords xx31
vpermq ymm1,ymm1,0x08 => ymm1 = x x x x a3 a2 a1 a0 / select qwords xx20
vpermq ymm2,ymm2,0x08 => ymm2 = x x x x b3 b2 b1 b0 / select qwords xx20
#3:
org: ymm1 = x x x x a3 a2 a1 a0; ymm2 = x x x x b3 b2 b1 b0
vpermq ymm1,ymm1,0x10 => ymm1 = x x a3 a2 x x a1 a0 / select qwords x1x0
vpermq ymm2,ymm2,0x10 => ymm2 = x x b3 b2 x x b1 b0 / select qwords x1x0
vpunpckldq ymm0,ymm1,ymm2 => ymm0 = b3 a3 b2 a2 b1 a1 b0 a0 / interlace low dwords





Cross lane operations, how?
Question #1
I have:
xmm0/mem128 = A3 A2 A1 A0
And I want to have:
ymm0 = A3 A3 A2 A2 A1 A1 A0 A0
Question #2
I have:
ymm0 = B3 A3 B2 A2 B1 A1 B0 A0
And I want to have:
xmm1/mem128 = A3 A2 A1 A0
xmm2/mem128 = B3 B2 B1 B0
Question #3
I have:
xmm1/mem128 = A3 A2 A1 A0
xmm2/mem128 = B3 B2 B1 B0
And I want to have:
ymm0 = B3 A3 B2 A2 B1 A1 B0 A0
How to accomplish those seemingly trivial transformations having in mind AVX cross-lane limitations?