seems that Intel has released at graphics documentation of IVB GPU at intellinuxgraphics so we now have GPU instruction set..
I asked some time ago intel to expose asm("") similar to Nvidia OpenCL GPU support.. maybe now makes sense to reask as perhaps for CPU doesn't make sense as code could be vectorized and seems asm code would force code to prevent automatic vectorization anyway with GPUs supporting SIMT in HW similar to Nvidia expose PTX support in OpenCL I think Intel should expose GPU native instructions currently not exposed in OpenCL such as addc (add with carry for big numbers more efficient support etc..) etc.. as noted in manual:
*Add bit manipulation instructions: bfi1, bfi2, bfrev, cbit, fbh, and fbl.
*Add the integer addc (Add with Carry) and subb (Subtract with Borrow) instructions.
Another way could be, be exposed as instrinsics __bfi1, __addc , etc.. but this would possibly take more work than simple asm() extension..
I think as GPU instruction richness evolves this should be required as for example new Nvidia GPUs exposes interwarp shuffle and this use exposes fastest reductions for example..