This SSE instruction should gather 32-bit integer or single precision float data to destination XMM register from four different memory locations. Pointers to those locations could be stored either in memory or in another XMM register as 32-bit integers.
In 64-bit mode it could use RSI register as a base address and the values from XMM register or from the memory could then be used as 32-bit offsets from base address in RSI.
Such an instruction would be most usefull for interpolation and in most cases it would have to gather adjacent or even overlapping values from memory so various optimizations could be possible internally.
Could someone pass this idea to the CPU development team?



