Inline Assembly

Microsoft* Style Inline Assembly

The Intel® C++ Compiler supports Microsoft-style inline assembly on Windows*. The Intel® C++ Compiler supports Microsoft-style inline assembly on Linux* when used with the -use-msasm option. See the Microsoft documentation for the proper syntax.

GNU*-like Style Inline Assembly (IA-32 architecture and Intel® 64 architecture only)

The Intel® C++ Compiler supports GNU-like style inline assembly. The syntax is as follows:

asm-keyword [ volatile-keyword ] ( asm-template [ asm-interface ] ) ;

The Intel® C++ Compiler also supports mixing UNIX* and Microsoft* style asms. Use the __asm__ keyword for GNU-style ASM when using the -use_msasm switch.

Note

The Intel® C++ Compiler supports gcc-style inline ASM if the assembler code uses AT&T* System V/386 syntax.

Syntax Element

Description

asm-keyword

Assembly statements begin with the keyword asm. Alternatively, either __asm or __asm__ may be used for compatibility. When mixing UNIX* and Microsoft* style asm, use the __asm__ keyword.

The compiler only accepts the __asm__ keyword. The asm and __asm keywords are reserved for Microsoft* style assembly statements.

volatile-keyword

If the optional keyword volatile is given, the asm is volatile. Two volatile asm statements are never moved past each other, and a reference to a volatile variable is not moved relative to a volatile asm. Alternate keywords __volatile and __volatile__ may be used for compatibility.

asm-template

The asm-template is a C language ASCII string that specifies how to output the assembly code for an instruction. Most of the template is a fixed string; everything but the substitution-directives, if any, is passed through to the assembler. The syntax for a substitution directive is a % followed by one or two characters.

asm-interface

The asm-interface consists of three parts:

  1. An optional output-list
  2. An optional input-list
  3. An optional clobber-list

These are separated by colon (:) characters. If the output-list is missing, but an input-list is given, the input list may be preceded by two colons (::) to take the place of the missing output-list. If the asm-interface is omitted altogether, the asm statement is considered volatile regardless of whether a volatile-keyword was specified.

output-list An output-list consists of one or more output-specs separated by commas. For the purposes of substitution in the asm-template, each output-spec is numbered. The first operand in the output-list is numbered 0, the second is 1, and so on. Numbering is continuous through the output-list and into the input-list. The total number of operands is limited to 30 (i.e. 0-29).
input-list Similar to an output-list, an input-list consists of one or more input-specs separated by commas. For the purposes of substitution in the asm-template, each input-spec is numbered, with the numbers continuing from those in the output-list.
clobber-list A clobber-list tells the compiler that the asm uses or changes a specific machine register that is either coded directly into the asm or is changed implicitly by the assembly instruction. The clobber-list is a comma-separated list of clobber-specs.
input-spec The input-specs tell the compiler about expressions whose values may be needed by the inserted assembly instruction. In order to describe fully the input requirements of the asm, you can list input-specs that are not actually referenced in the asm-template.
clobber-spec Each clobber-spec specifies the name of a single machine register that is clobbered. The register name may optionally be preceded by a %. You can specify any valid machine register name. It is also legal to specify "memory" in a clobber-spec. This prevents the compiler from keeping data cached in registers across the asm statement.

When compiling an assembly statement on Linux*, the compiler simply emits the asm-template to the assembly file after making any necessary operand substitutions. The compiler then calls the GNU* assembler to generate machine code. In contrast, on Windows* the compiler itself must assemble the text contained in the asm-template string into machine code. In essence, the compiler contains a built-in assembler.

The compiler’s built-in assembler supports the GNU* .byte directive but does not support other functionality of the GNU* assembler, so there are limitations in the contents of the asm-template. The following assembler features are not currently supported.

  • Directives other than the .byte directive

  • Symbols*

Note

* Direct symbol references in the asm-template are not supported. To access a C++ object, use the asm-interface with a substitution directive.

Example

Incorrect method for accessing a C++ object:

__asm__("addl $5, _x");

Proper method for accessing a C++ object:

__asm__("addl $5, %0" : "+rm" (x));

Additionally, there are some restrictions on the usage of labels. The compiler only allows local labels, and only references to labels within the same assembly statement are permitted. A local label has the form “N:”, where N is a non-negative integer. N does not have to be unique, even within the same assembly statement. To reference the most recent definition of label N, use “Nb”. To reference the next definition of label N, use “Nf”. In this context, “b” means backward and “f” means forward. For more information, refer to the GNU assembler documentation.

GNU-style inline assembly statements on Windows* use the same assembly instruction format as on Linux* which is often referenced as AT&T* assembly syntax. This means that destination operands are on the right and source operands are on the left. This operand order is the reverse of Intel assembly syntax.

Due to the limitations of the compiler's built-in assembler, many assembly statements that compile and run on Linux* will not compile on Windows*. On the other hand, assembly statements that compile and run on Windows* should also compile and run on Linux*.

This feature provides a high-performance alternative to Microsoft-style inline assembly statements when portability between operating systems is important. Its intended use is in small primitives where high-performance integration with the surrounding C++ code is essential.

#ifdef _WIN64 
#define INT64_PRINTF_FORMAT "I64" 
#else 
#define __int64 long long 
#define INT64_PRINTF_FORMAT "L" 
#endif 
#include <stdio.h> 
typedef struct {
    __int64 lo64;
    __int64 hi64; 
} my_i128; 
#define ADD128(out, in1, in2)                      \
    __asm__("addq %2, %0; adcq %3, %1" :           \
            "=r"(out.lo64), "=r"(out.hi64) :       \
            "emr" (in2.lo64), "emr"(in2.hi64),     \
            "0" (in1.lo64), "1" (in1.hi64));

extern int 
main() 
{
    my_i128 val1, val2, result;
    val1.lo64 = ~0;
    val1.hi64 = 0;

    val2.hi64 = 65;
    ADD128(result, val1, val2);
    printf("0x%016" INT64_PRINTF_FORMAT "x%016"   INT64_PRINTF_FORMAT "x\n",
            val1.hi64, val1.lo64);

    printf("+0x%016" INT64_PRINTF_FORMAT "x%016" INT64_PRINTF_FORMAT "x\n",
            val2.hi64, val2.lo64);

    printf("------------------------------------\n");
    printf("0x%016" INT64_PRINTF_FORMAT "x%016" INT64_PRINTF_FORMAT "x\n",
            result.hi64, result.lo64);
    return 0; 
}

This example, written for Intel® 64 architecture, shows how to use a GNU-style inline assembly statement to add two 128-bit integers. In this example, a 128-bit integer is represented as two __int64 objects in the my_i128 structure. The inline assembly statement used to implement the addition is contained in the ADD128 macro, which takes three my_i128 arguments representing three 128-bit integers. The first argument is the output. The next two arguments are the inputs. The example compiles and runs using the Intel® C++ Compiler on Linux* or Windows*, producing the following output.

  0x0000000000000000ffffffffffffffff 
+ 0x00000000000000410000000000000001 
------------------------------------ 
+ 0x00000000000000420000000000000000

In the GNU-style inline assembly implementation, the asm interface specifies all the inputs, outputs, and side effects of the asm statement, enabling the compiler to generate very efficient code.

mov       r13, 0xffffffffffffffff 
mov       r12, 0x000000000 
add       r13, 1 
adc       r12, 65

It is worth noting that when the compiler generates an assembly file on Windows*, it uses Intel syntax even though the assembly statement was written using AT&T* assembly syntax.

The compiler moves in1.lo64 into a register to match the constraint of operand 4. Operand 4's constraint of "0" indicates that it must be assigned the same location as output operand 0. And operand 0's constraint is "=r", indicating that it must be assigned an integer register. In this case, the compiler chooses r13. In the same way, the compiler moves in 1.hi64 into register r12.

The constraints for input operands 2 and 3 allow the operands to be assigned a register location ("r"), a memory location ("m"), or a constant signed 32-bit integer value ("e"). In this case, the compiler chooses to match operands 2 and 3 with the constant values 1 and 65, enabling the add and adc instructions to utilize the "register-immediate" forms.

The same operation is much more expensive using a Microsoft-style inline assembly statement, because the interface between the assembly statement and the surrounding C++ code is entirely through memory. Using Microsoft* assembly, the ADD128 macro might be written as follows.

#define ADD128(out, in1, in2)                      \
    {                                              \
        __asm mov rax, in1.lo64                    \
        __asm mov rdx, in1.hi64                    \
        __asm add rax, in2.lo64                    \
        __asm adc rdx, in2.hi64                    \
        __asm mov out.lo64, rax                    \
        __asm mov out.hi64, rdx                    \
     }

The compiler must add code before the assembly statement to move the inputs into memory, and it must add code after the assembly statement to retrieve the outputs from memory. This prevents the compiler from exploiting some optimization opportunities. Thus, the following assembly code is produced.

        mov       QWORD PTR [rsp+32], -1
        mov       QWORD PTR [rsp+40], 0
        mov       QWORD PTR [rsp+48], 1
        mov       QWORD PTR [rsp+56], 65

; Begin ASM

        mov       rax, QWORD PTR [rsp+32]
        mov       rdx, QWORD PTR [rsp+40]
        add       rax, QWORD PTR [rsp+48]
        adc       rdx, QWORD PTR [rsp+56]
        mov       QWORD PTR [rsp+64], rax
        mov       QWORD PTR [rsp+72], rdx

; End ASM

        mov       rdx, QWORD PTR [rsp+72]
        mov       r8, QWORD PTR [rsp+64]

The operation that took only four instructions and no memory references using GNU-style inline assembly takes twelve instructions with twelve memory references using Microsoft-style inline assembly.

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.