As it says in the title. ICC 17 generates:
my_copysign_1(float): movss xmm1, DWORD PTR .L_2il0floatpacket.1[rip] #4.12 movss xmm2, DWORD PTR .L_2il0floatpacket.0[rip] #4.12 andps xmm0, xmm2 #4.12 andnps xmm2, xmm1 #4.12 orps xmm0, xmm2 #4.12 ret #4.12 .L_2il0floatpacket.0: .long 0x80000000 .L_2il0floatpacket.1: .long 0x3f800000
A much better result is generated by e.g. GCC:
my_copysign_1(float): andps xmm0, XMMWORD PTR .LC1[rip] orps xmm0, XMMWORD PTR .LC0[rip] ret .LC0: .long 1065353216 .long 0 .long 0 .long 0 .LC1: .long 2147483648 .long 0 .long 0 .long 0
Even if you don't like the extra space used (and you should like it, because my profiles show it's faster), the actual operations (`andps`, `andnps`, and `orps`) can still be reduced by one instruction.