Penealty when mixing AVX and SSE

Penealty when mixing AVX and SSE


my code uses avx most of the time.

however from time to time i have to work with vectors of four floats and use SSE with _mm_sub_ps, _mm_add_ps, ...
for those. I read that there is a huge penealty involved here.

Why is that and how large is that penealty? Should i even use scalar operations for the vec4's instead?

How large is the penealty when i have one function that is already converted to AVX and call an old function using SSE from that (not inlined)?

All in all the speedup of using 64 Bit and AVX is now only about 30 Percent for my app, maybe i can get more out.


4 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

If i understand your code correctly, you have bunch of AVX instructions and afew SSE instructions in middle to work on 4 elements. If this code is written in instrinsics, there should not be any penalty. The reason is almost all float 4-element (or 128bit SSE) instructions use AVX encoding. Compiler is smart enough to generate the AVX - 128bit instructions eventhough you are using old SSE intrinsics. So in the end you have full AVX code, mix off 256bits and 128bits.
But issue is only when you have a precompiled SSE lib (w/o AVX switch) then get linked to your AVX code and somehow AVX code jumps to SSE code. Then you can have penality, but you can easily avoid that by adding vzeroupper before the lib call if you are not sure.


Yes, but i read here that some instructions like the instructions for the integers are not yet in AVX.

what about them? I have to convert 48 Floats to integers, it still makes sense to use SSE and not some scalar options, i guess.
Anyway, where is the cost of switching to SSE for Integer described?


Best Reply

The most (almost all) integer instrucitons are promoted to AVX 128bit (not to 256bit). So you wont get any penality. I am not sure where you read it, but they are not 256bit-AVX instructions. They are AVX 128bit. so if you write your code for those instructions, compiler will generate AVX form (provided right switches) and there will not be any penalty. Let me know if there is particular instruction or set of instruciotns that you are using and giving you bad performance due to non-AVX code.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui