Stack alignment on modern 32 bit bare metal ARMs?

Barrie Slaymaker barries1@gmail.com
Mon Aug 7 13:25:52 GMT 2023


Thank you, for the answer and the listserv cluebat.

Take care,

Barrie

On Mon, Aug 7, 2023 at 6:45 AM Richard Earnshaw (lists) <
Richard.Earnshaw@arm.com> wrote:

> This should be on gcc-help@gcc.gnu.org, not the main gcc@ list.  I've
> sent my response there (and hopefully BCC gcc@).
>
>
> On 06/08/2023 01:30, Barrie Slaymaker via Gcc wrote:
> > Hi,
> >
> > I'm cross compiling for 32 bit bare metal ARMs (modern ones: Cortex-M4
> and
> > Cortex M-33) w/ gcc 12.3.0, which is the latest available from ARM, (see
> > gcc -v output below) and have found that va_arg(..., double) (i.e.
> > __builtin_va_arg()) assumes that doubles are 64-bit aligned, but the
> stack
> > is not always so.
> >
> > I searched the bug database but didn't see this, so I'm guessing this
> isn't
> > a GCC bug--the ARM world would be on fire if it were. And I've searched
> the
> > gcc command line options docs, and the ARM architecture docs to no avail.
> > I'm hoping I didn't miss something obvious...
> >
> > So, does gcc assume or require that doubles on the stack be 64-bit
> aligned,
> > or is there an option we should be passing to either allow 32-bit
> alignment
> > or force 64-bit alignment, or is the MCU vendor's startup code a wee
> buggy
> > (this is what I suspect, but wanted to be damn sure before continuing)?
> >
>
> Your problem is a common one.  GCC maintains 64-bit stack alignment in
> code, but it does not align the stack if the caller messes up.  Your
> most likely problem is that the stack was not correctly aligned before
> calling main().  This is something the startup code must ensure when
> setting up the program environment.
>
> R.
>
> > Here's the test code:
> >
> > void va_args_test(int i, ...) {
> >      va_list args;
> >      va_start(args, i);
> >      double d = (int)va_arg(args, double);
> >      va_end(args);
> >      // display code elided
> > }
> >
> > Here's the generated assembly, with commentary mine:
> >
> > void va_args_test(int i, ...) {
> >      3f60:→  b40f      → push→   {r0, r1, r2, r3}
> >      3f62:→  b580      → push→   {r7, lr}
> >      3f64:→  b082      → sub→sp, #8
> >      3f66:→  af00      → add→r7, sp, #0
> >
> >      va_list args;
> >      3f68:→  2300      → movs→   r3, #0
> >      3f6a:→  607b      → str→r3, [r7, #4]
> >
> >      va_start(args, i);
> >      3f6c:→  f107 0314 → add.w→  r3, r7, #20
> >      3f70:→  607b      → str→r3, [r7, #4]
> >
> >      double d = (int)va_arg(args, double);
> >      3f72:→  f107 031b → add.w→  r3, r7, #27   ; Loads the address of the
> > last byte of the low order word into r3.
> >      3f76:→  f023 0307 → bic.w→  r3, r3, #7    ; Clears the low 3 bits,
> > which works when the double is 64-bit aligned. Not so much otherwise.
> >      3f7a:→  f103 0208 → add.w→  r2, r3, #8    ; Increments args'
> internal
> > pointer
> >      3f7e:→  607a      → str→r2, [r7, #4]      ; Saves that pointer
> >      3f80:→  e9d3 0100 → ldrd→   r0, r1, [r3]  ; Reads the double, right
> or
> > wrong...
> >
> > Here's the call site assembly:
> >
> >      va_args_test(0, (double)1.0);
> >      3fc2:→  2200      → movs→   r2, #0
> >      3fc4:→  4b09      → ldr→r3, [pc, #36]→  ; (3fec <main+0x44>)
> >      3fc6:→  2000      → movs→   r0, #0
> >      3fc8:→  4909      → ldr→r1, [pc, #36]→  ; (3ff0 <main+0x48>)
> >      3fca:→  4788      → blx→r1
> >
> > This is using GCC 12.3.0, cross-compiling for ARM on x86_64 (gcc -v
> output
> > below sig), with a command line like
> >
> > arm-none-eabi-gcc -o ../build/main/PAC5524/tmp/base/src/main.o
> > base/src/main.c <<-I options elided>>> -mcpu=cortex-m4 -march=armv7e-m
> > -mfpu=fpv4-sp-d16 -std=gnu99 -ffunction-sections -fno-omit-frame-pointer
> > -fno-strict-overflow -fsingle-precision-constant
> > -ftrivial-auto-var-init=zero -mthumb -mlittle-endian -mlong-calls
> > -mfloat-abi=hard -Og -c -MD -MP
> >
> > Removing any one of the -f options happens to align the stack correctly
> in
> > most cases (I've elided the -f options that don't affect this issue as
> far
> > as I can tell).
> >
> > Many thanks,
> >
> > Barrie
> >
> > gcc -v output:
> >
> > Using built-in specs.
> > COLLECT_GCC=arm-none-eabi-gcc
> >
> COLLECT_LTO_WRAPPER=/usr/share/arm-gnu-toolchain-12.3.rel1-x86_64-arm-none-eabi/bin/../libexec/gcc/arm-none-eabi/12.3.1/lto-wrapper
> > Target: arm-none-eabi
> > Configured with:
> > /data/jenkins/workspace/GNU-toolchain/arm-12/src/gcc/configure
> > --target=arm-none-eabi
> >
> --prefix=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install
> >
> --with-gmp=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
> > --with-mpfr=/data/jenkins/workspace/GNU-toolchai
> > n/arm-12/build-arm-none-eabi/host-tools
> >
> --with-mpc=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
> >
> --with-isl=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools
> > --disable-shared --disable-nls --disable-threads --disable-tls
> > --enable-checking=release --enable-language
> > s=c,c++,fortran --with-newlib --with-gnu-as --with-gnu-ld
> >
> --with-sysroot=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install/arm-none-eabi
> > --with-multilib-list=aprofile,rmprofile --with-pkgversion='Arm GNU
> > Toolchain 12.3.Rel1 (Build arm-12.35)' --with-bugurl=
> > https://bugs.linaro.org/
> > Thread model: single
> > Supported LTO compression algorithms: zlib
> > gcc version 12.3.1 20230626 (Arm GNU Toolchain 12.3.Rel1 (Build
> arm-12.35))
> >
> > Test code (the LED lights very prettily when va_arg() returns the correct
> > value):
> >
> > void va_args_test(int i, ...) {
> >      va_list args;
> >      va_start(args, i);
> >      i = (int)va_arg(args, double);
> >      va_end(args);
> >      bal_init();
> >      bal_set_AUX_LED1(i == 1);
> > }
> >
> > int main(void) {
> >     ...CPU initialization elided...
> >      va_args_test(0, (double)1.0);
> >      while (true) {
> >      }
> > }
>
>


More information about the Gcc-help mailing list