Mixing AVX and MMX code

Mixing AVX and MMX code

Dear all,

hope this hasn't been asked before, but I couldn't find a way to search the forum..?

In high performance code I'm using MMX and SSE together, since this gives me 8 additional very valuable registers. Looking at the AVX docs, this seems no longer possible with AVX code, since all MMX-related SSE instructions have not been promoted with a VEX prefix, and are therefore legacy instructions which I may no longer use (or face the deadly mixing penalty that requires VZEROUPPER etc.).

Is is correct that it's no longer possible to make heavy use of MMX registers in AVX code?

Can I at least continue using MMX registers without performance impact as long as there is no data transfer between MMX and SSE registers?

Thanks,

Elmar

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It is Not clear if you're using assembler language ( inline in C/C++ codes ) or Intel intrinsic functions. If you're using Intel C++ compiler take a look at sse2mmx.h header file in a ..\Compiler\Include folder.

引文:

Elmar 写道:

Dear all,

hope this hasn't been asked before, but I couldn't find a way to search the forum..?

In high performance code I'm using MMX and SSE together, since this gives me 8 additional very valuable registers. Looking at the AVX docs, this seems no longer possible with AVX code, since all MMX-related SSE instructions have not been promoted with a VEX prefix, and are therefore legacy instructions which I may no longer use (or face the deadly mixing penalty that requires VZEROUPPER etc.).

Is is correct that it's no longer possible to make heavy use of MMX registers in AVX code?

Can I at least continue using MMX registers without performance impact as long as there is no data transfer between MMX and SSE registers?

Thanks,

Elmar

not a direct answer to your question, sorry, but I'll strongly suggest to port your MMX code to SSE2, even if you have less logical registers than with your 8 MMX + 8 XMM  combination (or 8 MMX + 16 XMM  in 64-bit mode) you should measure good speedups thanks to the doubled throughput, it easily offsets the fact that you have less logical registers, I have experimented just that, a long time ago

then, you'll be able to compile your code (the very same source code if you use intrinsics) for AVX, if you want the same source code for AVX2 targets (with a doubled throughput again) it will be more challenging with intrinsics though

 

 

引文:

Sergey Kostrov 写道:

It is Not clear if you're using assembler language ( inline in C/C++ codes ) or Intel intrinsic functions. If you're using Intel C++ compiler take a look at sse2mmx.h header file in a ..\Compiler\Include folder.

Many thanks for your quick reply. I'm actually using my own code generator that creates assembly code for NASM. I'm now adding code paths for AVX and AVX2, and I hoped that an Intel insider could tell me what is and is not allowed regarding MMX.

For example, if I read the AVX docs correctly, the instruction MOVQ2DQ XMM0,MM0 is no longer allowed in AVX code because there is no VEX prefix version (and thus a huge penalty for mixing AVX and legacy code).

If this is correct, is it at least allowed to mix MMX-only code with AVX code? (I.e. code that runs entirely in MMX registers and never transfers data to an SSE register, or goes through memory for the transfer) Or are there other hidden pitfalls?

(Please don't suggest to simply stop using MMX, I'm gaining a lot of performance by using MMX for short vectors up to 8 bytes long, which would otherwise have to be spilled to memory, since my SSE/AVX registers are always full to the limit, especially in 32bit mode).

Thanks,

Elmar

>>...Please don't suggest to simply stop using MMX, I'm gaining a lot of performance by using MMX for short vectors

I support that firm position.

>>up to 8 bytes long, which would otherwise have to be spilled to memory, since my SSE/AVX registers are always full to
>>the limit, especially in 32bit mode)...

Please take a look at a very good article Avoiding AVX-SSE Transition Penalties ( attached ). Even if it is Not related to AVX-to-MMX transitions it has lots of technical details and recommendations.

Attachments: 

>>>Is is correct that it's no longer possible to make heavy use of MMX registers in AVX code?>>>

I wonder if is it even possible?

引文:

iliyapolak 写道:

>>>Is is correct that it's no longer possible to make heavy use of MMX registers in AVX code?>>>

I wonder if is it even possible?

there is no reason it isn't possible since only the physical registers are shared, the 8 MMX architected registers and the 16 (8 in 32-bit mode) YMM architected registers are fully separated (when modifying one kind of register there is no side effect to a register in the other set), the transition occurs for SSE to/from AVX though, since the XMM architected state is aliasing the YMM state (much like the MMX state is aliasing the x87 stack)

based on this, I don't see why there can be a "transition penalty" since there is actually no transition, just like when you mix x87 code and AVX code, I know it works with no problem (no issues reported by VTune Amplifier for example) from hand-on experience, I suppose it's exactly the same with MMX code since the MMX state is aliasing the x87 state but has no intersection with the XMM/YMM state, but in this case I miss first hand experience since I don't have MMX code in my code base anymore

Insted of worring about loss of 8 MMX registers, think about how to best use the other half of the 16 ymm registers.

Jim Dempsey

>>>Insted of worring about loss of 8 MMX registers, think about how to best use the other half of the 16 ymm registers.>>>

Very true.

Leave a Comment

Please sign in to add a comment. Not a member? Join today