Differences between revisions 51 and 52
Revision 51 as of 2020-09-20 15:48:33
Size: 24401
Editor: GJLay
Comment: Typo.
Revision 52 as of 2020-11-21 19:04:03
Size: 24407
Editor: GJLay
Comment: Link GIT instead of now unavailable SVN.
Deletions are marked like this. Additions are marked like this.
Line 317: Line 317:
It is hosted and deployed as [[http://gcc.gnu.org/viewcvs/gcc/trunk/libgcc/config/avr/libf7|part of libgcc]]. Hence, it will be part of any avr-gcc distribution from v10 onwards without any further ado, except the compiler is configured with {{{--with-libf7=no}}}. It is hosted and deployed as [[http://gcc.gnu.org/git/?p=gcc.git;a=tree;f=libgcc/config/avr/libf7|part of libgcc]]. Hence, it will be part of any avr-gcc distribution from v10 onwards without any further ado, except the compiler is configured with {{{--with-libf7=no}}}.


Application Binary Interface and implementation defined behaviour of avr-gcc. Object format bits are not discussed here. See also C Implementation-defined behaviour.

Type Layout

Endianess: Little













long long




unsigned int










depends on configuration and command line options

long double


depends on configuration and command line options



Deviations from the Standard

long double

In avr-gcc up to v9, double and long double are only 32 bits wide and implemented in the same way as float.

In avr-gcc v10 and higher, the layout of double and long double are determined by configure options --with-double= and --with-long-double=, respectively. The default layout of double is like float, and the default layout of long double is a 64-bit IEEE format, see GCC configure options for details. Depending on the configuration, command line options -mdouble=32 and -mdouble=64 are available so that the type layout of double can be chosen at compile time, similar for -mlong-double=32 and -mlong-double=64 for long double. In order to test in a program which type layout has been chosen, GCC built-in macros __SIZEOF_DOUBLE__ and __SIZEOF_LONG_DOUBLE__ can be used.

8-bit int with -mint8

With -mint8 int is only 8 bits wide which does not comply to the C standard. Notice that -mint8 is not a multilib option and neither supported by AVR-Libc (except stdint.h) nor by newlib.

  • -mint8












    long long




    long unsigned int



    long int

  • Fixed-Point Support

    avr-gcc 4.8 and up supports fixed point arithmetic according to ISO/IEC TR 18037. The support is not complete. The type layouts are as follows:


















    long long




    GCC extension













    long long




    GCC extension

    Overflow behaviour of the non-saturated arithmetic is unspecified.

    Please notice that some private ports found on the web implement different layouts.

    Register Layout

    Values that occupy more than one 8-bit register start in an even register.

    Fixed Registers

    Fixed Registers are registers that won't be allocated by GCC's register allocator. Registers R0 and R1 are fixed and used implicitly while printing out assembler instructions:


    is used as scratch register that need not to be restored after its usage. It must be saved and restored in interrupt service routine's (ISR) prologue and epilogue. In inline assembler you can use __tmp_reg__ for the scratch register.


    always contains zero. During an insn the content might be destroyed, e.g. by a MUL instruction that uses R0/R1 as implicit output register. If an insn destroys R1, the insn must restore R1 to zero afterwards. This register must be saved in ISR prologues and must then be set to zero because R1 might contain values other than zero. The ISR epilogue restores the value. In inline assembler you can use __zero_reg__ for the zero register.

    the T flag in the status register (SREG) is used in the same way like the temporary scratch register R0.

    User-defined global registers by means of global register asm and / or -ffixed-n won't be saved or restored in function pro- and epilogue.

    Call-Used Registers

    The call-used or call-clobbered general purpose registers (GPRs) are registers that might be destroyed (clobbered) by a function call.

    R18–R27, R30, R31
    These GPRs are call clobbered. An ordinary function may use them without restoring the contents. Interrupt service routines (ISRs) must save and restore each register they use.
    R0, T-Flag
    The temporary register and the T-flag in SREG are also call-clobbered, but this knowledge is not exposed explicitly to the compiler (R0 is a fixed register).

    Call-Saved Registers

    R2–R17, R28, R29
    The remaining GPRs are call-saved, i.e. a function that uses such a registers must restore its original content. This applies even if the register is used to pass a function argument.
    The zero-register is implicity call-saved (implicit because R1 is a fixed register).

    Frame Layout

    Frame Layout after Function Prologue

    incoming arguments

    return address (2–3 bytes)

    saved registers

    stack slots, Y+1 points at the bottom

    During compilation the compiler may come up with an arbitrary number of pseudo registers which will be allocated to hard registers during register allocation.

    • Pseudos that don't get a hard register will be put into a stack slot and loaded / stored as needed.
    • In order to access stack locations, avr-gcc will set up a 16-bit frame pointer in R29:R28 (Y) because the stack pointer (SP) cannot be used to access stack slots.
    • The stack grows downwards. Smaller addresses are at the bottom of the drawing at the right.
    • Stack pointer and frame pointer are not aligned, i.e. 1-byte aligned.
    • After the function prologue, the frame pointer will point one byte below the stack frame, i.e. Y+1 points to the bottom of the stack frame.
    • Any of "incoming arguments", "saved registers" or "stack slots" in the drawing at the right may be empty.
    • Even "return address" may be empty which happens for functions that are tail-called.

    Calling Convention

    • An argument is passed either completely in registers or completely in memory.
    • To find the register where a function argument is passed, initialize the register number Rn with R26 and follow this procedure:

      1. If the argument size is an odd number of bytes, round up the size to the next even number.
      2. Subtract the rounded size from the register number Rn.

      3. If the new Rn is at least R8 and the size of the object is non-zero, then the low-byte of the argument is passed in Rn. Subsequent bytes of the argument are passed in the subsequent registers, i.e. in increasing register numbers.

      4. If the new register number Rn is smaller than R8 or the size of the argument is zero, the argument will be passed in memory.

      5. If the current argument is passed in memory, stop the procedure: All subsequent arguments will also be passed in memory.
      6. If there are arguments left, goto 1. and proceed with the next argument.
    • Return values with a size of 1 byte up to and including a size of 8 bytes will be returned in registers. Return values whose size is outside that range will be returned in memory.
    • If a return value cannot be returned in registers, the caller will allocate stack space and pass the address as implicit first pointer argument to the callee. The callee will put the return value into the space provided by the caller.
    • If the return value of a function is returned in registers, the same registers are used as if the value was the first parameter of a non-varargs function. For example, an 8-bit value is returned in R24 and an 32-bit value is returned R22...R25.
    • Arguments of varargs functions are passed on the stack. This applies even to the named arguments.

    For example, suppose a function with the following prototype:

    • int func (char a, long b);


    • a will be passed in R24.
    • b will be passed in R20, R21, R22 and R23 with the LSB in R20 and the MSB in R23.
    • The result is returned in R24 (LSB) and R25 (MSB).

    Exceptions to the Calling Convention

    GCC comes with libgcc, a runtime support library. This library implements functions that are too complicated to be emit inline by GCC. What functions are used when depends on the target architecture, what instructions are available, how expensive they are and on the optimization level.

    Functions in libgcc are implemented in C or hand-written assembly. In the latter case, some functions use a special ABI that allows better code generation by the compiler.

    For example, the function that computes unsigned 8-bit quotient and remainder, __udivmodqi4, just returns the quotient and the remainder and clobbers R22 and R23. The compiler knows that the function does not destroy R30, for example, and may hold a value in R30 across the function call. This reduces the register pressure in functions that call __udivmodqi4.







    4.7+ && MUL

    SI:22 = HI:26 * HI:18


    Multiply 2 unsigned 16-bit integers to a 32-bit result


    4.7+ && MUL

    SI:22 = HI:26 * HI:18


    Multiply 2 signed 16-bit integers to a 32-bit result


    4.7+ && MUL

    SI:22 = HI:26 * HI:18


    Multiply the signed 16-bit integer in R26 with the unsigned 16-bit integer in R18 to a 32-bit result


    4.7+ && MUL

    SI:22 = HI:26 * SI:18


    Multiply an unsigned 16-bit integer with a 32-bit integer to a 32-bit result


    4.7+ && MUL

    SI:22 = HI:26 * SI:18


    Multiply a signed 16-bit integer with a 32-bit integer to a 32-bit result


    QI:24 = QI:24 / QI:22
    QI:25 = QI:24 % QI:22


    Unsigned 8-bit integer quotient and remainder


    QI:24 = QI:24 / QI:22
    QI:25 = QI:24 % QI:22

    R23, Rtmp, T

    Signed 8-bit integer quotient and remainder


    HI:22 = HI:24 / HI:22
    HI:24 = HI:24 % HI:22

    R21, R26...27

    Unsigned 16-bit integer quotient and remainder


    HI:22 = HI:24 / HI:22
    HI:24 = HI:24 % HI:22

    R21, R26...27, Rtmp, T

    Signed 16-bit integer quotient and remainder

    The Operation column uses GCC's machine modes to describe how values in registers are interpreted.

    Machine Modes

    Qarter, 8 bit

    Half, 16 bit

    Single, 32 bit

    Double, 64 bit

    Partial Single, 24 bit










    Signed _Accum




    Signed _Fract (Q-Format)





    Unsigned _Accum




    Unsigned _Fract (Q-Format)





    Reduced Tiny

    On the Reduced Tiny cores (16 GPRs only) several modifications to the ABI above apply:

    • Call-saved registers are: R18–R19, R28–R29.
    • Fixed Registers are R16 (__tmp_reg__) and R17 (__zero_reg__).

    • Registers used to pass arguments to functions and return values from functions are R25...R18 (instead of R25...R8).

    There is only limited library support both from libgcc and AVR-LibC, for example there is no float support and no printf support.



    • Signed and unsigned 24-bit integers: __int24 (v4.7), __uint24 (v4.7).


    • Variable: progmem, absdata (v7).

    • Function: interrupt, signal, naked, OS_main (v4.4), OS_task (v4.4), no_gccisr (v8).

    • Type: (none).


    • (none)

    Address Spaces

    • __flash (v4.7), __flash1 ... __flash5 (v4.7), __memx (v4.7).

    Using avr-gcc

    Supporting "unsupported" Devices

    avr-gcc v8.4+, v9.3 and newer

    Since v10 there is a somewhat simpler scheme to provide a device specs file than the one as lined out in the next section: You can specify the specs file directly by means of

    • avr-gcc -nodevicespecs -specs=my-spec-file ...

    There is no more need to mess with system paths like with -B path, and there is no more need to specify -mmcu=mydevice: All information is dragged from my-spec-file, see also the GCC online documentation for -nodevicespecs.

    avr-gcc v5 and newer

    In contrast to older versions of the compiler that support -mmcu=device natively, v5+ comes with a bunch of spec files in ./lib/gcc/avr/version/device-specs. These files are generated when the compiler is built and are part of each distribution since then. Spec files specify substitution and transformation rules for command line options for the compiler proper and for subprograms like assembler and linker.

    Adding support for a new device consists in writing a new spec file for that device and supply it by means of

    • avr-gcc -mmcu=mydevice -B path-to-dir ...

    where path-to-dir is a directory containing a folder named device-specs which contains a file named specs-mydevice. As a blue print, start with an already existing spec file for a device as closely related to mydevice as possible. Also read the comments in that spec file.

    Just like with older versions, you have to get the device headers which are realm of avr-libc from somewhere; same applies for the startup code in crtmydevice.o and for the device library libmydevice.a. If you do not need or have a device library, -nodevicelib will do, but note that some non-standard functionality like EEPROM support is missing then.

    avr-gcc v4.9 and below

    avr-gcc and avr-as support the -mmcu=device command line option to generate code for a specific device. Currently (2012), there are more than 200 known AVR devices and the hardware vendor keeps releasing new devices. If you need support for such a device and don't want to rebuild the tools, you can

    1. Sit and wait until support for your -mmcu=device is added to the tools.

    2. Use appropriate command line options to compile for your favourite device.

    Approach 1 is comfortable but slow. Lazy developers that don't care for time-to-market will use it.

    Approach 2 is preferred if you want to start development as soon as possible and don't want to wait until the tool chain with respective device support is released. This approach is only possible if the compiler and Binutils already come with support for the core architecture of your device.

    When you feed code into the compiler and compile for a specific device, the compiler will only care for the respective core; it won't care for the exact device. It does not matter to the compiler how many I/O pins the device has, at what voltage it operates, how much RAM is present, how many timers or UARTs are on the silicon or in what package it is shipped. The only thing the compiler does with -mmcu=device is to build-in define a specific macro and to call the linker in a specific way, i.e. the compiler driver behaves a bit differently, but the sub-tools like compiler proper and assembler will generate exactly the same code.

    Thus, you can support your device by setting these options by hand.

    Additionally, we need the following to compile a C program:

    • A device support header avr/io.h similar to the headers provided by AVR Libc.

    • Startup code for the device.

    The Device Header avr/io.h

    This header and its subheaders contain almost all information about a particular device like SFR addresses, size of the interrupt table and interrupt names, etc.

    After all, it's just text and you can write it yourself. Find a device that is already supported by AVR-Libc and that is similar enough to your new device to serve as a reasonable starting point for the new device description.

    If you are lucky, the device it already supported by AVR-Libc but not yet by the compiler. In that case, you can use verbatim copies from AVR-Libc.

    Yet another approach is to write the file from scratch or not to use avr/io.h like headers at all. I that case, you provide all needed definitions like, say, SP and size of the vector table yourself.

    If your toolchain is distributed with AVR-Libc then avr/io.h is located in the installation directory at ./avr/include i.e. you find a file io.h in ./avr/include/avr. In that file you find the lines:

    #if defined (__AVR_AT94K__)
    #  include <avr/ioat94k.h>
    #elif defined (__AVR_AT43USB320__)
    #  include <avr/io43u32x.h>
    /* many many more entries */
    #  if !defined(__COMPILING_AVR_LIBC__)
    #    warning "device type not defined"
    #  endif

    Add an entry for __AVR_mydevice__ and include your new file avr/iomydevice.h.

    If you don't want to change the existing avr/io.h then copy it to a new directory and add that directory as system search path by means of -isystem whenever you compile or preprocess a C or assembler source that shall include the extended avr/io.h. Notice that the new directory will contain a subdirectory named avr.

    Compiling the Code

    Let's start with a simple C program, source.c:

    #include <avr/io.h>
    int var;
    int main (void)
        return var + SP;

    Your source directory then contains the following files:

    • source.c    gcrt1.S    macros.inc    sectionname.h

    The startup code gcrt1.S and macros.inc are verbatim copies from AVR-Libc.

    sectionname.h is included by macros.inc but we don't need it: Simply provide sectionname.h as an empty file.

    For the matter of simplicity, we show how to compile for a device that is similar to ATmega8 so that we don't need to extend avr/io.h to show the work flow. In the case you copied avr/io.h to a new place, don't forget to add respective -isystem to the first two commands for source.c and gcrt1.S.

    ATmega8 is a device in core family avr4, thus we compile and assemble our source.c for that core architecture. __AVR_ATmega8__ stands for the subheader selector you added to avr/io.h.

    • avr-gcc -mmcu=avr4 -D__AVR_ATmega8__ -c source.c -Os

    Similarly, we assemble the startup code for our device by means of:

    • avr-gcc -mmcu=avr4 -D__AVR_ATmega8__ -c gcrt1.S -o crt0-mydevice.o

    Finally, we link the stuff together to get a working source.elf (assuming that RAM starts at address 0x124):

    • avr-gcc -mmcu=avr4 -Tdata 0x800124 source.o crt0-mydevice.o -nostartfiles -o source.elf



    Libf7 is an ad-hoc, AVR-specific, 64-bit floating point emulation written in GNU-C and (inline) assembly. It is hosted and deployed as part of libgcc. Hence, it will be part of any avr-gcc distribution from v10 onwards without any further ado, except the compiler is configured with --with-libf7=no.


    • The emulated 64-bit floating point representation is IEEE compatible: Little endian, 11 bit for the encoded exponent, 52 bits for the encoded mantissa.
    • The internal, decoded representation uses f7_t, a struct with 1 byte for flags, 7 bytes for the mantissa and 2 bytes for the exponent.

    • Functions like sin are wrappers that convert double (or long double for that matter) to f7_t and then call functions prefixed with __f7_ that operate on the internal representation. In the case of sin, the worker function has prototype

      • void __f7_sin (f7_t*, const f7_t*)

    • The basic arithmetic is implemented mostly in assembly. Functions like __f7_sin are implemented in C.

    • The transcendental functions are implemented using MiniMax approximations, i.e. they minimize the maximum norm. Most of these functions use rational MiniMax approximations because they perform better than CORDIC (and they perform better than Taylor or Padé expansions, of course). Square-root uses 3 iterations of Newton-Raphson.

    • Portability to other architectures or to other compilers was of no consideration; the implementation focuses solely on avr-gcc. This means that if you want to implement a 64-bit floating point emulation to be used elsewhere, Libf7 is of no use — except that the used algorithms and MiniMax polynomials might provide you some additional perspectives.


    Libf7 is incomplete:

    • For devices that do not support the MUL instruction, assembly routines that would require MUL instructions are not implemented. This means that when you try to link programs with 64-bit double for a device without MUL, you will get an undefined reference from the linker like for __f7_mul_mant_asm.

    • Some functions from math.h like atan2, lround, lrint, fma, Bessel and Gamma are not implemented. If you try to use them, you will get undefined references from the linker; in the case of atan2 the missing function is __f7_atan2. If you really need it, you can provide such functions in your projects, or better still contribute them to GCC.

    • Signed zeroes are handled like zero.

    Libf7 is not optimal:

    • sqrt uses 3 steps of Newton-Raphson iteration. Using an old-school algorithm might reduce code size (after considering all dependencies). And it would be faster, of course, speeding up sqrt, hypot, asin, acos and their long double variants.

    Other Implementations

    • fp64lib from Uwe Bissinger: Written in GNU assembly. Slightly less precise. Roughly the same speed (except for square root which is several times faster). Smaller stack footprint. Slightly smaller code for basic arithmetic, otherwise comparable code sizes. No build script / Makefile as it targets the Arduino ecosystem.

    • avr_f64.c from Detlef with improvements from Florian Königstein: Implemented in C. Resource consumption might be a multiple of what Libf7 consumes. Easy to integrate in own projects that use avr-gcc without native 64-bit double support. Precision is quite good except for some corner(?) cases where it might deteriorate. Could be compiled after fixing minor problems (missing const at progmem). Should also work with other compilers / targets.

    • dannis64bit.S from Peter Danegger. Written in GNU assembly.

    None: avr-gcc (last edited 2020-11-21 19:04:03 by GJLay)