Can I avoid usage of movdqa and movaps on stack variables with -O2?

Can I avoid usage of movdqa and movaps on stack variables with -O2?


we use icc 12.1 on x86-32 Linux. When compiling our code with -O2 many instructions like this are generated

movdqa XMMWORD PTR [esp+0xe0],xmm0

movaps XMMWORD PTR [esp+48],xmm7
Since the arguments to movdqa and moaps must be aligned to 16 bytes. This obviously requires that the stack is aligned to 16 bytes.
When compiling our code into a binary everything works fine. It seems like the compiler takes care of proper alignment of the stack?
However, we also compile our code into a shared object which is loaded into a Java virtual machine. Our code is then called through JNI and frequently crashes because the stack is not aligned to 16 bytes when the instruction is executed. The misaligned access results in a SIGSEGV.
The problem seems to go away when using -O1 instead of -O2, and in fact, the crashing function no longer contains movdqa/movaps in that case.
We also link object files generated with 'icc -O2' to code that is compiled and used by code compiled with g++ (an old version of g++ that does not have -mrealignstack). There the same problem could potentially arise.
Is there any way to compile with -O2 but force icc to not assume that the stack is aligned. So that we don't get the instructions that require stack alignment but still can benefit from -O2?
If not, is there a way to force the compiler to generate a sort of prologue for functions that would ensure proper stack alignment?
Or is there another way to make sure the stack is aligned properly when the offending instructions are executed?

Thanks a lot


4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It seems like add '-falign-stack=assume-4-byte' to the compiler options cures the problem.
Is this expected? Is this the right thing to do?



Best Reply


yes, I think that's the way to go here. I'd propose to test "-falign-stack=maintain-16-byte", though. The reason is that if the stack should be, for whatever reason, already 16 byte aligned the compiler can take advantage without falling back to enforced unaligned access in that case.

I don't have a JNI example at hand right now but I'd like to hear whether "-falign-stack=maintain-16-byte" works for you as well.

Best regards,

Georg Zitzlsberger

Thanks a lot for tip!
-falign-stack=maintain-16-byte seems to work as well. All the test cases that crashed before now pass (just as they did with assume-4-byte). We will go with maintain-16-byte then.

Thank you again,


Leave a Comment

Please sign in to add a comment. Not a member? Join today