How do I avoid XMM book-keeping code around __asm blocks in 64-bits

How do I avoid XMM book-keeping code around __asm blocks in 64-bits

Greetings,

When I write x86_64 assembly blocks, I saw that the compiler is generating book-keeping code to preserve values of XMM8 to XMM15. So I tend to use them but sometimes we really need all XMM 16 registers. The problem is that the book-keeping code is a fixed cost that could be avoided and sometimes it invalidates our optimizations.

Is there any way to avoid preserving those registers? Any calling convention to do this?

Many thanks,

Guillaume Piolat

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

- Is it possible to provide an example of codes ( C/C++ and assembler ) that demonstrates the issue?

- Could you use Intel® Software Development Emulator ( Intel® SDE ) to verify that you don't have SSE-to-AVX and AVX-to-SSE transitions ( this is only my suggestion and I could be wrong ).

Hi Sergey, here is a sample code that does this.

void test()
{
    __asm
    {
        pxor xmm8, xmm8  // could be whatever using xmm8
    }
}

int main(int argc, char* argv[])
{
    #pragma noinline
    test();
}

The generated code for the test function is:

sub rsp, 56
movaps XMMWORD PTR [32+rsp], xmm8
pxor xmm8, xmm8
movaps xmm8, XMMWORD PTR [32+rsp]
add rsp, 56
ret

The compiler is able to see that we do'nt modify rbx, rbp, rsi, rdi, r12, r13, r14, r15, xmm6, xmm7, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14 and xmm15 so it doesn't generate book-keeping code for them, but still that's a lot of registers we are not allowed to use without penalty!

About SSE-to-AVX, I don't think we use AVX code.

>>...The compiler is able to see that we do'nt modify rbx, rbp, rsi, rdi...

Your problem could be "transformed" to a good feature request, something like:

#pragma donotsavexmmregs

Did you consider a pure-assembler implementation of some functions you need?

> Did you consider a pure-assembler implementation of some functions you need?

Not yet. I think the "problem" is that the compiler keep register values to conform with the calling convention, yet I'm not sure what other calling conventions would allow to use more registers.

>>...Not yet. I think the "problem" is that the compiler keep register values to conform with the calling convention...

I think this is the only solution at the moment ( I mean pure-assembler implementation ). For example, TBB library has the same issues and there is a small set of functions implemented in pure-assembler.

Note: Also, __declspec( naked ) directive would not help because it is Not supported on 64-bit platforms.

I suggest you write the function as a seperate function in C++, compile with assembler listing, edit listing to remove what you think is unnecessary code. Remove the .cpp file from the project, add the .asm file to the project (you may requie to adjust the solution/make to generate the .obj from the .asm). I've had to rely on this myself due to inline assembler not supporting the code I wanted.

Jim Dempsey

www.quickthreadprogramming.com

Leave a Comment

Please sign in to add a comment. Not a member? Join today