Next: , Previous: Basic Asm, Up: Using Assembly Language with C


6.43.2 Extended Asm - Assembler Instructions with C Expression Operands

The asm keyword allows you to embed assembler instructions within C code. With Extended asm you can read and write C variables from assembler and perform jumps from assembler code to C labels.

     asm [volatile] ( AssemblerTemplate : [OutputOperands] [ : [InputOperands] [ : [Clobbers] ] ] )
     
     asm [volatile] goto ( AssemblerTemplate : : [InputOperands] : [Clobbers] : GotoLabels )

To create headers compatible with ISO C, write __asm__ instead of asm and __volatile__ instead of volatile (see Alternate Keywords). There is no alternate for goto.

By definition, Extended asm is an asm statement that contains operands. To separate the classes of operands, you use colons. Basic asm statements contain no colons. (So, for example, asm("int $3") is Basic asm, and asm("int $3" : ) is Extended asm. see Basic Asm.)

Qualifiers

volatile
The typical use of Extended asm statements is to manipulate input values to produce output values. However, your asm statements may also produce side effects. If so, you may need to use the volatile qualifier to disable certain optimizations. See Volatile.

goto
This qualifier informs the compiler that the asm statement may perform a jump to one of the labels listed in the GotoLabels section. See GotoLabels.

Parameters

AssemblerTemplate
This is a literal string that contains the assembler code. It is a combination of fixed text and tokens that refer to the input, output, and goto parameters. See AssemblerTemplate.

OutputOperands
A comma-separated list of the C variables modified by the instructions in the AssemblerTemplate. See OutputOperands.

InputOperands
A comma-separated list of C expressions read by the instructions in the AssemblerTemplate. See InputOperands.

Clobbers
A comma-separated list of registers or other values changed by the AssemblerTemplate, beyond those listed as outputs. See Clobbers.

GotoLabels
When you are using the goto form of asm, this section contains the list of all C labels to which the AssemblerTemplate may jump. See GotoLabels.

Remarks

The asm statement allows you to include assembly instructions directly within C code. This may help you to maximize performance in time-sensitive code or to access assembly instructions that are not readily available to C programs.

Note that Extended asm statements must be inside a function. Only Basic asm may be outside functions (see Basic Asm). Functions declared with the naked attribute also require Basic asm (see Function Attributes).

While the uses of asm are many and varied, it may help to think of an asm statement as a series of low-level instructions that convert input parameters to output parameters. So a simple (if not particularly useful) example for i386 using asm might look like this:

     int src = 1;
     int dst;
     
     asm ("mov %1, %0\n\t"
         "add $1, %0"
         : "=r" (dst)
         : "r" (src));
     
     printf("%d\n", dst);

This code will copy src to dst and add 1 to dst.

6.43.2.1 Volatile

GCC's optimizers sometimes discard asm statements if they determine there is no need for the output variables. Also, the optimizers may move code out of loops if they believe that the code will always return the same result (i.e. none of its input values change between calls). Using the volatile qualifier disables these optimizations. asm statements that have no output operands are implicitly volatile.

Examples:

This i386 code demonstrates a case that does not use (or require) the volatile qualifier. If it is performing assertion checking, this code uses asm to perform the validation. Otherwise, dwRes is unreferenced by any code. As a result, the optimizers can discard the asm statement, which in turn removes the need for the entire DoCheck routine. By omitting the volatile qualifier when it isn't needed you allow the optimizers to produce the most efficient code possible.

     void DoCheck(uint32_t dwSomeValue)
     {
        uint32_t dwRes;
     
        // Assumes dwSomeValue is not zero.
        asm ("bsfl %1,%0"
          : "=r" (dwRes)
          : "r" (dwSomeValue)
          : "cc");
     
        assert(dwRes > 3);
     }

The next example shows a case where the optimizers can recognize that the input (dwSomeValue) never changes during the execution of the function and can therefore move the asm outside the loop to produce more efficient code. Again, using volatile disables this type of optimization.

     void do_print(uint32_t dwSomeValue)
     {
        uint32_t dwRes;
     
        for (uint32_t x=0; x < 5; x++)
        {
           // Assumes dwSomeValue is not zero.
           asm ("bsfl %1,%0"
             : "=r" (dwRes)
             : "r" (dwSomeValue)
             : "cc");
     
           printf("%u: %u %u\n", x, dwSomeValue, dwRes);
        }
     }

The following example demonstrates a case where you need to use the volatile qualifier. It uses the i386 RDTSC instruction, which reads the computer's time-stamp counter. Without the volatile qualifier, the optimizers might assume that the asm block will always return the same value and therefore optimize away the second call.

     uint64_t msr;
     
     asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
             "shl $32, %%rdx\n\t"  // Shift the upper bits left.
             "or %%rdx, %0"        // 'Or' in the lower bits.
             : "=a" (msr)
             :
             : "rdx");
     
     printf("msr: %llx\n", msr);
     
     // Do other work...
     
     // Reprint the timestamp
     asm volatile ( "rdtsc\n\t"    // Returns the time in EDX:EAX.
             "shl $32, %%rdx\n\t"  // Shift the upper bits left.
             "or %%rdx, %0"        // 'Or' in the lower bits.
             : "=a" (msr)
             :
             : "rdx");
     
     printf("msr: %llx\n", msr);

GCC's optimizers will not treat this code like the non-volatile code in the earlier examples. They do not move it out of loops or omit it on the assumption that the result from a previous call is still valid.

Note that the compiler can move even volatile asm instructions relative to other code, including across jump instructions. For example, on many targets there is a system register that controls the rounding mode of floating-point operations. Setting it with a volatile asm, as in the following PowerPC example, will not work reliably.

     asm volatile("mtfsf 255, %0" : : "f" (fpenv));
     sum = x + y;

The compiler may move the addition back before the volatile asm. To make it work as expected, add an artificial dependency to the asm by referencing a variable in the subsequent code, for example:

     asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
     sum = x + y;

Under certain circumstances, GCC may duplicate (or remove duplicates of) your assembly code when optimizing. This can lead to unexpected duplicate symbol errors during compilation if your asm code defines symbols or labels. Using %= (see AssemblerTemplate) may help resolve this problem.

6.43.2.2 Assembler Template

An assembler template is a literal string containing assembler instructions. The compiler will replace any references to inputs, outputs, and goto labels in the template, and then output the resulting string to the assembler. The string can contain any instructions recognized by the assembler, including directives. GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input. However, it does count the statements (see Size of an asm).

You may place multiple assembler instructions together in a single asm string, separated by the characters normally used in assembly code for the system. A combination that works in most places is a newline to break the line, plus a tab character to move to the instruction field (written as "\n\t"). Some assemblers allow semicolons as a line separator. However, note that some assembler dialects use semicolons to start a comment.

Do not expect a sequence of asm statements to remain perfectly consecutive after compilation, even when you are using the volatile qualifier. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement.

Accessing data from C programs without using input/output operands (such as by using global symbols directly from the assembler template) may not work as expected. Similarly, calling functions directly from an assembler template requires a detailed understanding of the target assembler and ABI.

Since GCC does not parse the AssemblerTemplate, it has no visibility of any symbols it references. This may result in GCC discarding those symbols as unreferenced unless they are also listed as input, output, or goto operands.

GCC can support multiple assembler dialects (for example, GCC for i386 supports "att" and "intel" dialects) for inline assembler. In builds that support this capability, the -masm option controls which dialect GCC uses as its default. The hardware-specific documentation for the -masm option contains the list of supported dialects, as well as the default dialect if the option is not specified. This information may be important to understand, since assembler code that works correctly when compiled using one dialect will likely fail if compiled using another.

Using braces in asm templates

If your code needs to support multiple assembler dialects (for example, if you are writing public headers that need to support a variety of compilation options), use constructs of this form:

     { dialect0 | dialect1 | dialect2... }

This construct outputs 'dialect0' when using dialect #0 to compile the code, 'dialect1' for dialect #1, etc. If there are fewer alternatives within the braces than the number of dialects the compiler supports, the construct outputs nothing.

For example, if an i386 compiler supports two dialects (att, intel), an assembler template such as this:

     "bt{l %[Offset],%[Base] | %[Base],%[Offset]}; jc %l2"

would produce the output:

     For att: "btl %[Offset],%[Base] ; jc %l2"
     For intel: "bt %[Base],%[Offset]; jc %l2"

Using that same compiler, this code:

     "xchg{l}\t{%%}ebx, %1"

would produce

     For att: "xchgl\t%%ebx, %1"
     For intel: "xchg\tebx, %1"

There is no support for nesting dialect alternatives. Also, there is no “escape” for an open brace ({), so do not use open braces in an Extended asm template other than as a dialect indicator.

Other format strings

In addition to the tokens described by the input, output, and goto operands, there are a few special cases:

6.43.2.3 Output Operands

An asm statement has zero or more output operands indicating the names of C variables modified by the assembler code.

In this i386 example, old (referred to in the template string as %0) and *Base (as %1) are outputs and Offset (%2) is an input:

     bool old;
     
     __asm__ ("btsl %2,%1\n\t" // Turn on zero-based bit #Offset in Base.
              "sbb %0,%0"      // Use the CF to calculate old.
        : "=r" (old), "+rm" (*Base)
        : "Ir" (Offset)
        : "cc");
     
     return old;

Operands use this format:

     [ [asmSymbolicName] ] "constraint" (cvariablename)

asmSymbolicName

When not using asmSymbolicNames, use the (zero-based) position of the operand in the list of operands in the assembler template. For example if there are three output operands, use %0 in the template to refer to the first, %1 for the second, and %2 for the third. When using an asmSymbolicName, reference it by enclosing the name in square brackets (i.e. %[Value]). The scope of the name is the asm statement that contains the definition. Any valid C variable name is acceptable, including names already defined in the surrounding code. No two operands within the same asm statement can use the same symbolic name.

constraint
Output constraints must begin with either "=" (a variable overwriting an existing value) or "+" (when reading and writing). When using "=", do not assume the location will contain the existing value (except when tying the variable to an input; see Input Operands).

After the prefix, there must be one or more additional constraints (see Constraints) that describe where the value resides. Common constraints include "r" for register and "m" for memory. When you list more than one possible location (for example "=rm"), the compiler chooses the most efficient one based on the current context. If you list as many alternates as the asm statement allows, you will permit the optimizers to produce the best possible code. If you must use a specific register, but your Machine Constraints do not provide sufficient control to select the specific register you want, Local Reg Vars may provide a solution (see Local Reg Vars).

cvariablename
Specifies the C variable name of the output (enclosed by parentheses). Accepts any (non-constant) variable within scope.

Remarks:

The total number of input + output + goto operands has a limit of 30. Commas separate the operands. When the compiler selects the registers to use to represent the output operands, it will not use any of the clobbered registers (see Clobbers).

Output operand expressions must be lvalues. The compiler cannot check whether the operands have data types that are reasonable for the instruction being executed. For output expressions that are not directly addressable (for example a bit-field), the constraint must allow a register. In that case, GCC uses the register as the output of the asm, and then stores that register into the output.

Unless an output operand has the '&' constraint modifier (see Modifiers), GCC may allocate it in the same register as an unrelated input operand, on the assumption that the assembler code will consume its inputs before producing outputs. This assumption may be false if the assembler code actually consists of more than one instruction. In this case, use '&' on each output operand that must not overlap an input.

The same problem can occur if one output parameter (a) allows a register constraint and another output parameter (b) allows a memory constraint. The code generated by GCC to access the memory address in b can contain registers which might be shared by a, and GCC considers those registers to be inputs to the asm. As above, GCC assumes that such input registers are consumed before any outputs are written. This assumption may result in incorrect behavior if the asm writes to a before using b. Combining the `&' constraint with the register constraint ensures that modifying a will not affect what address is referenced by b. Omitting the `&' constraint means that the location of b will be undefined if a is modified before using b.

asm supports operand modifiers on operands (for example %k2 instead of simply %2). Typically these qualifiers are hardware dependent. The list of supported modifiers for i386 is found at i386 Operand modifiers.

If the C code that follows the asm makes no use of any of the output operands, use volatile for the asm statement to prevent the optimizers from discarding the asm statement as unneeded (see Volatile).

Examples:

This code makes no use of the optional asmSymbolicName. Therefore it references the first output operand as %0 (were there a second, it would be %1, etc). The number of the first input operand is one greater than that of the last output operand. In this i386 example, that makes Mask %1:

     uint32_t Mask = 1234;
     uint32_t Index;
     
       asm ("bsfl %1, %0"
          : "=r" (Index)
          : "r" (Mask)
          : "cc");

That code overwrites the variable Index ("="), placing the value in a register ("r"). The generic "r" constraint instead of a constraint for a specific register allows the compiler to pick the register to use, which can result in more efficient code. This may not be possible if an assembler instruction requires a specific register.

The following i386 example uses the asmSymbolicName operand. It produces the same result as the code above, but some may consider it more readable or more maintainable since reordering index numbers is not necessary when adding or removing operands. The names aIndex and aMask are only used to emphasize which names get used where. It is acceptable to reuse the names Index and Mask.

     uint32_t Mask = 1234;
     uint32_t Index;
     
       asm ("bsfl %[aMask], %[aIndex]"
          : [aIndex] "=r" (Index)
          : [aMask] "r" (Mask)
          : "cc");

Here are some more examples of output operands.

     uint32_t c = 1;
     uint32_t d;
     uint32_t *e = &c;
     
     asm ("mov %[e], %[d]"
        : [d] "=rm" (d)
        : [e] "rm" (*e));

Here, d may either be in a register or in memory. Since the compiler might already have the current value of the uint32_t pointed to by e in a register, you can enable it to choose the best location for d by specifying both constraints.

6.43.2.4 Input Operands

Input operands make inputs from C variables and expressions available to the assembly code.

Specify input operands by using the format:

     [ [asmSymbolicName] ] "constraint" (cexpression)

asmSymbolicName
When not using asmSymbolicNames, use the (zero-based) position of the operand in the list of operands, including outputs, in the assembler template. For example, if there are two output parameters and three inputs, %2 refers to the first input, %3 to the second, and %4 to the third. When using an asmSymbolicName, reference it by enclosing the name in square brackets (e.g. %[Value]). The scope of the name is the asm statement that contains the definition. Any valid C variable name is acceptable, including names already defined in the surrounding code. No two operands within the same asm statement can use the same symbolic name.

constraint
Input constraints must be a string containing one or more constraints (see Constraints). When you give more than one possible constraint (for example, "irm"), the compiler will choose the most efficient method based on the current context. Input constraints may not begin with either "=" or "+". If you must use a specific register, but your Machine Constraints do not provide sufficient control to select the specific register you want, Local Reg Vars may provide a solution (see Local Reg Vars).

Input constraints can also be digits (for example, "0"). This indicates that the specified input will be in the same place as the output constraint at the (zero-based) index in the output constraint list. When using asmSymbolicNames for the output operands, you may use these names (enclosed in brackets []) instead of digits.

cexpression
This is the C variable or expression being passed to the asm statement as input.

When the compiler selects the registers to use to represent the input operands, it will not use any of the clobbered registers (see Clobbers).

If there are no output operands but there are input operands, place two consecutive colons where the output operands would go:

     __asm__ ("some instructions"
        : /* No outputs. */
        : "r" (Offset / 8);

Warning: Do not modify the contents of input-only operands (except for inputs tied to outputs). The compiler assumes that on exit from the asm statement these operands will contain the same values as they had before executing the assembler. It is not possible to use Clobbers to inform the compiler that the values in these inputs are changing. One common work-around is to tie the changing input variable to an output variable that never gets used. Note, however, that if the code that follows the asm statement makes no use of any of the output operands, the GCC optimizers may discard the asm statement as unneeded (see Volatile).

Remarks:

The total number of input + output + goto operands has a limit of 30.

asm supports operand modifiers on operands (for example %k2 instead of simply %2). Typically these qualifiers are hardware dependent. The list of supported modifiers for i386 is found at i386 Operand modifiers.

Examples:

In this example using the fictitious combine instruction, the constraint "0" for input operand 1 says that it must occupy the same location as output operand 0. Only input operands may use numbers in constraints, and they must each refer to an output operand. Only a number (or the symbolic assembler name) in the constraint can guarantee that one operand is in the same place as another. The mere fact that foo is the value of both operands is not enough to guarantee that they are in the same place in the generated assembler code.

     asm ("combine %2, %0"
        : "=r" (foo)
        : "0" (foo), "g" (bar));

Here is an example using symbolic names.

     asm ("cmoveq %1, %2, %[result]"
        : [result] "=r"(result)
        : "r" (test), "r" (new), "[result]" (old));

6.43.2.5 Clobbers

While the compiler is aware of changes to entries listed in the output operands, the assembler code may modify more than just the outputs. For example, calculations may require additional registers, or the processor may overwrite a register as a side effect of a particular assembler instruction. In order to inform the compiler of these changes, list them in the clobber list. Clobber list items are either register names or the special clobbers (listed below). Each clobber list item is enclosed in double quotes and separated by commas.

Clobber descriptions may not in any way overlap with an input or output operand. For example, you may not have an operand describing a register class with one member when listing that register in the clobber list. Variables declared to live in specific registers (see Explicit Reg Vars), and used as asm input or output operands, must have no part mentioned in the clobber description. In particular, there is no way to specify that input operands get modified without also specifying them as output operands.

When the compiler selects which registers to use to represent input and output operands, it will not use any of the clobbered registers. As a result, clobbered registers are available for any use in the assembler code.

Here is a realistic example for the VAX showing the use of clobbered registers:

     asm volatile ("movc3 %0, %1, %2"
                        : /* No outputs. */
                        : "g" (from), "g" (to), "g" (count)
                        : "r0", "r1", "r2", "r3", "r4", "r5");

Also, there are two special clobber arguments:

  1. The "cc" clobber indicates that the assembler code modifies the flags register. On some machines, GCC represents the condition codes as a specific hardware register; "cc" serves to name this register. On other machines, condition code handling is different, and specifying "cc" has no effect. But it is valid no matter what the machine.
  2. The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler will not assume that any values read from memory before an asm will remain unchanged after that asm; it will reload them as needed. This effectively forms a read/write memory barrier for the compiler.

    Note that this clobber does not prevent the processor from doing speculative reads past the asm statement. To prevent that, you need processor-specific fence instructions.

    Flushing registers to memory has performance implications and may be an issue for time-sensitive code. One trick to avoid this is available if the size of the memory being accessed is known at compile time. For example, if accessing ten bytes of a string, use a memory input like:

    {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.

6.43.2.6 Goto Labels

asm goto allows assembly code to jump to one or more C labels. The GotoLabels section in an asm goto statement contains a comma-separated list of all C labels to which the assembler code may jump. GCC assumes that asm execution falls through to the next statement (if this is not the case, consider using the __builtin_unreachable intrinsic after the asm statement). Optimization of asm goto may be improved by using the hot and cold label attributes (see Label Attributes). The total number of input + output + goto operands has a limit of 30.

An asm goto statement can not have outputs (which means that the statement is implicitly volatile). This is due to an internal restriction of the compiler: control transfer instructions cannot have outputs. If the assembler code does modify anything, use the "memory" clobber to force the optimizers to flush all register values to memory, and reload them if necessary, after the asm statement.

To reference a label, prefix it with %l (that's a lowercase L) followed by its (zero-based) position in GotoLabels plus the number of input arguments. For example, if the asm has three inputs and references two labels, refer to the first label as %l3 and the second as %l4).

asm statements may not perform jumps into other asm statements. GCC's optimizers do not know about these jumps; therefore they cannot take account of them when deciding how to optimize.

Example code for i386 might look like:

     asm goto (
         "btl %1, %0\n\t"
         "jc %l2"
         : /* No outputs. */
         : "r" (p1), "r" (p2)
         : "cc"
         : carry);
     
     return 0;
     
     carry:
     return 1;

The following example shows an asm goto that uses the memory clobber.

     int frob(int x)
     {
       int y;
       asm goto ("frob %%r5, %1; jc %l[error]; mov (%2), %%r5"
                 : /* No outputs. */
                 : "r"(x), "r"(&y)
                 : "r5", "memory"
                 : error);
       return y;
     error:
       return -1;
     }

6.43.2.7 i386 Operand modifiers

Input, output, and goto operands for extended asm statements can use modifiers to affect the code output to the assembler. For example, the following code uses the "h" and "b" modifiers for i386:

     uint16_t  num;
     asm volatile ("xchg %h0, %b0" : "+a" (num) );

These modifiers generate this assembler code:

     xchg %ah, %al

The rest of this discussion uses the following code for illustrative purposes.

     int main()
     {
        int iInt = 1;
     
     top:
     
        asm volatile goto ("some assembler instructions here"
        : /* No outputs. */
        : "q" (iInt), "X" (sizeof(unsigned char) + 1)
        : /* No clobbers. */
        : top);
     }

With no modifiers, this is what the output from the operands would be for the att and intel dialects of assembler:

Operand masm=att masm=intel
%0 %eax eax
%1 $2 2
%2 $.L2 OFFSET FLAT:.L2

The table below shows the list of supported modifiers and their effects.

Modifier Description Operand masm=att masm=intel
z Print the opcode suffix for the size of the current integer operand (one of b/w/l/q). %z0 l
b Print the QImode name of the register. %b0 %al al
h Print the QImode name for a “high” register. %h0 %ah ah
w Print the HImode name of the register. %w0 %ax ax
k Print the SImode name of the register. %k0 %eax eax
q Print the DImode name of the register. %q0 %rax rax
l Print the label name with no punctuation. %l2 .L2 .L2
c Require a constant operand and print the constant expression with no punctuation. %c1 2 2

6.43.2.8 i386 floating-point asm operands

On i386 targets, there are several rules on the usage of stack-like registers in the operands of an asm. These rules apply only to the operands that are stack-like registers:

  1. Given a set of input registers that die in an asm, it is necessary to know which are implicitly popped by the asm, and which must be explicitly popped by GCC.

    An input register that is implicitly popped by the asm must be explicitly clobbered, unless it is constrained to match an output operand.

  2. For any input register that is implicitly popped by an asm, it is necessary to know how to adjust the stack to compensate for the pop. If any non-popped input is closer to the top of the reg-stack than the implicitly popped register, it would not be possible to know what the stack looked like—it's not clear how the rest of the stack “slides up”.

    All implicitly popped input registers must be closer to the top of the reg-stack than any input that is not implicitly popped.

    It is possible that if an input dies in an asm, the compiler might use the input register for an output reload. Consider this example:

              asm ("foo" : "=t" (a) : "f" (b));
    

    This code says that input b is not popped by the asm, and that the asm pushes a result onto the reg-stack, i.e., the stack is one deeper after the asm than it was before. But, it is possible that reload may think that it can use the same register for both the input and the output.

    To prevent this from happening, if any input operand uses the f constraint, all output register constraints must use the & early-clobber modifier.

    The example above would be correctly written as:

              asm ("foo" : "=&t" (a) : "f" (b));
    
  3. Some operands need to be in particular places on the stack. All output operands fall in this category—GCC has no other way to know which registers the outputs appear in unless you indicate this in the constraints.

    Output operands must specifically indicate which register an output appears in after an asm. =f is not allowed: the operand constraints must select a class with a single register.

  4. Output operands may not be “inserted” between existing stack registers. Since no 387 opcode uses a read/write operand, all output operands are dead before the asm, and are pushed by the asm. It makes no sense to push anywhere but the top of the reg-stack.

    Output operands must start at the top of the reg-stack: output operands may not “skip” a register.

  5. Some asm statements may need extra stack space for internal calculations. This can be guaranteed by clobbering stack registers unrelated to the inputs and outputs.

Here are a couple of reasonable asms to want to write. This asm takes one input, which is internally popped, and produces two outputs.

     asm ("fsincos" : "=t" (cos), "=u" (sin) : "0" (inp));

This asm takes two inputs, which are popped by the fyl2xp1 opcode, and replaces them with one output. The st(1) clobber is necessary for the compiler to know that fyl2xp1 pops both inputs.

     asm ("fyl2xp1" : "=t" (result) : "0" (x), "u" (y) : "st(1)");