I have yet to find the optimal way. There are two and half solution, none of them are flawless.
1. vbroadcasti128 ymm0, qword ptr [...]
Second operand is a memory type, perfect for loading global constants while saving 16 bytes, but not if you already have the source in a xmm register. Why there is no register to register form? Even the intrinsic takes a value type, which the compiler has to save first, to reload, crazy.
2. cast xmm0 to ymm0, vinserti128 ymm0, ymm0, xmm0, 1
That looks to be the obvious choice, but when you think about it, it's a RAW dependency, whatever you were doing with the register has to be computed first to execute the insert. There is an option to use a second register, which you can vpxor with itself first and insert into that one twice, to both lanes, not sure if it's worth it.
3. cast xmm0 to ymm0, vperm2i128 ymm0, ymm0, ymm0, 0
Unless there is some kind of smart checking for this instruction, whether it overwrites both lanes, this is also a dependent one, same problem as with vinserti128.