I have a question about shufps instructions. So what kind of C code would usually generate shufps by the compiler?
Thank you for your help!
I'm sure we can't guess your target without hints. Code which gathers or scatters elements to and from a packed vector under SSE2 code option, possibly with the help of #pragma vector always. Setting SSE4 options would promote newer instructions for the same purpose.
>>...So what kind of C code would usually generate shufps by the compiler?
I agree with Tim that your question is really hard to answer. So, I've looked at Intel headers with intrinsic functions and here are some details:
* Shuffle Packed Single Precision Floating-Point Values
* **** VSHUFPS ymm1, ymm2, ymm3/m256, imm8
* Moves two of the four packed single-precision floating-point values
* from each double qword of the first source operand into the low
* quadword of each double qword of the destination; moves two of the four
* packed single-precision floating-point values from each double qword of
* the second source operand into to the high quadword of each double qword
* of the destination. The selector operand determines which values are moved
* to the destination.
extern __m256 __ICL_INTRINCC _mm256_shuffle_ps(__m256, __m256, const int);
A very generic answer could look like: A C/C++ compiler will generate the instruction if C/C++ code uses _mm256_shuffle_ps intrinsic function, or has inline assembler code for the instruction ( it is assumed that support for generation of AVX instructions is enabled ).
Also, you need to look at Intel Instruction Set Reference Manual ( Volumes 2A, 2B and 2C ) for more detailed decription of the instruction.
Sorry for the confusion. I meant to ask what kind of C code could be possibly translated into shufps by the compiler. I think the problem has been solved. Thank you guys! :-)