Stack alignment on modern 32 bit bare metal ARMs?

Richard Earnshaw (lists) Richard.Earnshaw@arm.com
Mon Aug 7 10:45:20 GMT 2023


This should be on gcc-help@gcc.gnu.org, not the main gcc@ list.  I've 
sent my response there (and hopefully BCC gcc@).


On 06/08/2023 01:30, Barrie Slaymaker via Gcc wrote:
> Hi,
> 
> I'm cross compiling for 32 bit bare metal ARMs (modern ones: Cortex-M4 and
> Cortex M-33) w/ gcc 12.3.0, which is the latest available from ARM, (see
> gcc -v output below) and have found that va_arg(..., double) (i.e.
> __builtin_va_arg()) assumes that doubles are 64-bit aligned, but the stack
> is not always so.
> 
> I searched the bug database but didn't see this, so I'm guessing this isn't
> a GCC bug--the ARM world would be on fire if it were. And I've searched the
> gcc command line options docs, and the ARM architecture docs to no avail.
> I'm hoping I didn't miss something obvious...
> 
> So, does gcc assume or require that doubles on the stack be 64-bit aligned,
> or is there an option we should be passing to either allow 32-bit alignment
> or force 64-bit alignment, or is the MCU vendor's startup code a wee buggy
> (this is what I suspect, but wanted to be damn sure before continuing)?
> 

Your problem is a common one.  GCC maintains 64-bit stack alignment in 
code, but it does not align the stack if the caller messes up.  Your 
most likely problem is that the stack was not correctly aligned before 
calling main().  This is something the startup code must ensure when 
setting up the program environment.

R.

> Here's the test code:
> 
> void va_args_test(int i, ...) {
>      va_list args;
>      va_start(args, i);
>      double d = (int)va_arg(args, double);
>      va_end(args);
>      // display code elided
> }
> 
> Here's the generated assembly, with commentary mine:
> 
> void va_args_test(int i, ...) {
>      3f60:→  b40f      → push→   {r0, r1, r2, r3}
>      3f62:→  b580      → push→   {r7, lr}
>      3f64:→  b082      → sub→sp, #8
>      3f66:→  af00      → add→r7, sp, #0
> 
>      va_list args;
>      3f68:→  2300      → movs→   r3, #0
>      3f6a:→  607b      → str→r3, [r7, #4]
> 
>      va_start(args, i);
>      3f6c:→  f107 0314 → add.w→  r3, r7, #20
>      3f70:→  607b      → str→r3, [r7, #4]
> 
>      double d = (int)va_arg(args, double);
>      3f72:→  f107 031b → add.w→  r3, r7, #27   ; Loads the address of the
> last byte of the low order word into r3.
>      3f76:→  f023 0307 → bic.w→  r3, r3, #7    ; Clears the low 3 bits,
> which works when the double is 64-bit aligned. Not so much otherwise.
>      3f7a:→  f103 0208 → add.w→  r2, r3, #8    ; Increments args' internal
> pointer
>      3f7e:→  607a      → str→r2, [r7, #4]      ; Saves that pointer
>      3f80:→  e9d3 0100 → ldrd→   r0, r1, [r3]  ; Reads the double, right or
> wrong...
> 
> Here's the call site assembly:
> 
>      va_args_test(0, (double)1.0);
>      3fc2:→  2200      → movs→   r2, #0
>      3fc4:→  4b09      → ldr→r3, [pc, #36]→  ; (3fec <main+0x44>)
>      3fc6:→  2000      → movs→   r0, #0
>      3fc8:→  4909      → ldr→r1, [pc, #36]→  ; (3ff0 <main+0x48>)
>      3fca:→  4788      → blx→r1
> 
> This is using GCC 12.3.0, cross-compiling for ARM on x86_64 (gcc -v output
> below sig), with a command line like
> 
> arm-none-eabi-gcc -o ../build/main/PAC5524/tmp/base/src/main.o
> base/src/main.c <<-I options elided>>> -mcpu=cortex-m4 -march=armv7e-m
> -mfpu=fpv4-sp-d16 -std=gnu99 -ffunction-sections -fno-omit-frame-pointer
> -fno-strict-overflow -fsingle-precision-constant
> -ftrivial-auto-var-init=zero -mthumb -mlittle-endian -mlong-calls
> -mfloat-abi=hard -Og -c -MD -MP
> 
> Removing any one of the -f options happens to align the stack correctly in
> most cases (I've elided the -f options that don't affect this issue as far
> as I can tell).
> 
> Many thanks,
> 
> Barrie
> 
> gcc -v output:
> 
> Using built-in specs.
> COLLECT_GCC=arm-none-eabi-gcc
> COLLECT_LTO_WRAPPER=/usr/share/arm-gnu-toolchain-12.3.rel1-x86_64-arm-none-eabi/bin/../libexec/gcc/arm-none-eabi/12.3.1/lto-wrapper
> Target: arm-none-eabi
> Configured with:
> /data/jenkins/workspace/GNU-toolchain/arm-12/src/gcc/configure
> --target=arm-none-eabi
> --prefix=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install
> --with-gmp=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
> --with-mpfr=/data/jenkins/workspace/GNU-toolchai
> n/arm-12/build-arm-none-eabi/host-tools
> --with-mpc=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
> --with-isl=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
> --disable-shared --disable-nls --disable-threads --disable-tls
> --enable-checking=release --enable-language
> s=c,c++,fortran --with-newlib --with-gnu-as --with-gnu-ld
> --with-sysroot=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install/arm-none-eabi
> --with-multilib-list=aprofile,rmprofile --with-pkgversion='Arm GNU
> Toolchain 12.3.Rel1 (Build arm-12.35)' --with-bugurl=
> https://bugs.linaro.org/
> Thread model: single
> Supported LTO compression algorithms: zlib
> gcc version 12.3.1 20230626 (Arm GNU Toolchain 12.3.Rel1 (Build arm-12.35))
> 
> Test code (the LED lights very prettily when va_arg() returns the correct
> value):
> 
> void va_args_test(int i, ...) {
>      va_list args;
>      va_start(args, i);
>      i = (int)va_arg(args, double);
>      va_end(args);
>      bal_init();
>      bal_set_AUX_LED1(i == 1);
> }
> 
> int main(void) {
>     ...CPU initialization elided...
>      va_args_test(0, (double)1.0);
>      while (true) {
>      }
> }



More information about the Gcc-help mailing list