Recently I meet some problems with icl's auto-vectorization. In my application, there are some loops, their body contian float-BYTEs conversion. But there isn't float-BYTE conversion SIMD instruction., there have float-int conversion. But there isn't int-BYTE concertion SIMD instruction too.
In this situation, What I think of is use temp memory to store float-ints conversion result, and then myself transform ints to bytes using PACK instructions.
If there is int-byte or float-byte conversion SIMD, the temp memory is not necessary, and the performance must be better.
I have tests, when come to byte-int stream conversion, the icl will use unpack instruction to unpack bytes, and then setthe unpacked results to the int destination.
I think, as the icl can use unpack to unpack BYTEs, why it cant use PACK to PACK ints? Or create one pragma to tell the compiler to PACK ints?