There are something wrong with using svml in inline ASM

There are something wrong with using svml in inline ASM

     I try using __svml_sin2 in inline ASM like the way compiler does.  A code snippet as following,

     "vmovupd (%1), %%ymm0\n\t"
     "call __svml_sin4\n\t"
     "vmovupd %%ymm0, (%0)\n\t"
     "sub $1, %%rax\n\t"
     "jnz 3b\n\t"

    The program can build. But, the running output values are wrong.

    Then I use GDB to locate the problem. It seems that, the SVMLfunction __svml_sin4 uses the general registers rax,rbx,rcx,rdx and so on,without save the scene. So I want to save the registers modified by SVML myself. The problem is, I do not know exactly which registers are modified. Maybe different SVML function use different registers.

    So, anybody knows how to use the svml in inline assembly correctly? 

    thanks in advance for any answer.

5 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

According to x86-64 ABI (, section 3.2.1), only rbp, rbx and r12-r15 general purpose registers need to be preserved by the called function. All other general purpose registers can be clobbered. I believe, this is applied to all UNIX-like systems.

The convention on Windows is summarized here:

If you need to preserve values of clobbered registers you should save and restore them around the function call.


Hi, andysem!   

Thank you for your answer. It is very helpful.

Now the problem is that I do not know which registers are clobbered. So, if need to preserve the scence, I must save all the registers except  rbp, rbx and r12-r1. It seem to be too expensive! Do you have any idea about that?

Thanks again!

You have to assume that any registers that are not required to be preserved can be clobbered. You don't have to save all registers, only those having sensible data for your program (i.e. the calling function). Compilers usually store a shadow copy of variables on the stack so that the values can be saved and restored when needed. Minimizing and scheduling these moves is one of optimizations compilers perform that you'll have to do manually in the assembler code.

zhang y.,

Try this. It works with me in MinGW64 and Windows.

extern "C" __m256d __svml_sin4(const __m256d &a);

__inline __m256d sin(const __m256d &a)
    __m256d    ret;

    __asm volatile
        "vmovaps    %1, %%ymm0\n"
//        "push        %%rax\n"
//        "push        %%rax\n"
        "call        __svml_sin4\n"
//        "pop        %%rax\n"
//        "pop        %%rax\n"
        "vmovaps    %%ymm0, %0\n"
        : "=m"(ret) : "m"(a) : "%xmm0"
    return ret;

    __m256d    src, ret;
    ret = sin(src);

If something's wrong, try uncomment push/pop.



Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen