Created attachment 53137 [details] files with the source and results for cortex-a72 and cortex-a53 When compiling a file("test.c") with a simple function like this: void test(char *buf) { __builtin_strcpy(buf, "abcd1234"); } With this: ../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc -c -Wall -Wextra -O3 -mbig-endian -ffreestanding -fbuiltin -nostdinc -save-temps -mcpu=cortex-a72 test.c -o test.a72 The code generated is: .LC0: .ascii "abcd1234\000" .text .align 2 .global test .syntax unified .arm .type test, %function test: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movw r3, #:lower16:.LC0 movt r3, #:upper16:.LC0 strd r4, [sp, #-8]! ldrd r4, [r3] ldrb r3, [r3, #8] @ zero_extendqisi2 str r5, [r0] @ unaligned str r4, [r0, #4] @ unaligned ldrd r4, [sp] add sp, sp, #8 strb r3, [r0, #8] bx lr String read: In "ldrd r4, [r3]" It reads the string "abcd1234" into r4 and r5. (I am skipping the null string terminator. It is written ok). Buffer fill: Then, it writes r5 into the start of the buffer "str r5, [r0]" And then r4 in the offset 4 of the buffer "str r4, [r0, #4]". The result is that we get the buffer with the string 32bits-swapped "1234abcd". *** WRONG RESULT *** But, if we compile the same file with -mcpu=cortex-a53 it works fine: .LC0: .ascii "abcd1234\000" .text .align 2 .global test .syntax unified .arm .type test, %function test: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movw r3, #:lower16:.LC0 movt r3, #:upper16:.LC0 mov r2, r0 ldmia r3!, {r0, r1} str r0, [r2] @ unaligned str r1, [r2, #4] @ unaligned ldrb r3, [r3] @ zero_extendqisi2 strb r3, [r2, #8] bx lr String read: ldmia r3!, {r0, r1} Buffer fill: str r0, [r2] @ unaligned str r1, [r2, #4] @ unaligned *** CORRECT ORDER *** The gcc release: git log -1 commit 1ea978e3066ac565a1ec28a96a4d61eaf38e2726 (HEAD, tag: releases/gcc-12.1.0) Author: Jakub Jelinek <jakub@redhat.com> Date: Fri May 6 07:07:53 2022 +0000 Update ChangeLog and version files for release git remote -v origin git://gcc.gnu.org/git/gcc.git (fetch) origin git://gcc.gnu.org/git/gcc.git (push) The output when I add the "-v" option: ../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc -v -c -Wall -Wextra -O3 -mbig-endian -ffreestanding -fbuiltin -nostdinc -save-temps -mcpu=cortex-a72 test.c -o test.a72 Using built-in specs. COLLECT_GCC=../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc Target: armeb-none-eabi Configured with: /home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/src/gcc/configure --build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu --target=armeb-none-eabi --prefix=/home/gjimenez/x-tools/armeb-none-eabi --exec_prefix=/home/gjimenez/x-tools/armeb-none-eabi --with-local-prefix=/home/gjimenez/x-tools/armeb-none-eabi/armeb-none-eabi --with-headers=/home/gjimenez/x-tools/armeb-none-eabi/armeb-none-eabi/include --with-newlib --enable-threads=no --disable-shared --with-pkgversion='crosstool-NG 1.25.0.36_a3e3d73' --enable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-libmpx --disable-libssp --disable-libquadmath --disable-libquadmath-support --disable-libstdcxx-verbose --with-gmp=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --with-mpfr=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --with-mpc=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --with-isl=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --disable-lto --enable-target-optspace --disable-nls --enable-multiarch --with-multilib-list='rmprofile aprofile' --enable-languages=c,c++ Thread model: single Supported LTO compression algorithms: zlib gcc version 12.1.0 (crosstool-NG 1.25.0.36_a3e3d73) COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc' /home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/cc1 -E -quiet -nostdinc -v -imultilib thumb/v8-a/nofp -D__USES_INITFINI__ test.c -mbig-endian -mcpu=cortex-a72 -mfloat-abi=soft -marm -mlibarch=armv8-a+crc -march=armv8-a+crc -Wall -Wextra -ffreestanding -fbuiltin -O3 -fpch-preprocess -o test.i #include "..." search starts here: #include <...> search starts here: End of search list. COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc' /home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/cc1 -fpreprocessed test.i -quiet -dumpbase test.c -dumpbase-ext .c -mbig-endian -mcpu=cortex-a72 -mfloat-abi=soft -marm -mlibarch=armv8-a+crc -march=armv8-a+crc -O3 -Wall -Wextra -version -ffreestanding -fbuiltin -o test.s GNU C17 (crosstool-NG 1.25.0.36_a3e3d73) version 12.1.0 (armeb-none-eabi) compiled by GNU C version 9.4.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C17 (crosstool-NG 1.25.0.36_a3e3d73) version 12.1.0 (armeb-none-eabi) compiled by GNU C version 9.4.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 3261fde4c69fe46364115246804cccd4 COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc' /home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/bin/as -EB -march=armv8-a+crc -mfloat-abi=soft -meabi=5 -o test.a72 test.s COMPILER_PATH=/home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/bin/ LIBRARY_PATH=/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/thumb/v8-a/nofp/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/lib/thumb/v8-a/nofp/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/lib/ COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc' '-dumpdir' 'test.'
Confirmed. Looking at the sources, I think this has been broken since r230663.
The master branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>: https://gcc.gnu.org/g:8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3 commit r13-1110-g8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3 Author: Richard Earnshaw <rearnsha@arm.com> Date: Wed Jun 15 16:07:20 2022 +0100 arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981] The code in gen_cpymem_ldrd_strd has been incorrect for big-endian since r230663. The problem is that we use gen_lowpart, etc. to split the 64-bit quantity, but fail to account for the fact that these routines are really dealing with 64-bit /values/ and in big-endian the ordering of the sub-registers changes. To fix this, I've renamed the conceptually misnamed low_reg and hi_reg as first_reg and second_reg, and then used different logic for big-endian targets to initialize these values. This makes the logic clearer than trying to think about high bits and low bits. gcc/ChangeLog: PR target/105981 * config/arm/arm.cc (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg to first_reg and second_reg respectively. Initialize them correctly when generating big-endian code.
Fixed on trunk so far.
The releases/gcc-12 branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>: https://gcc.gnu.org/g:723c1d6284ca9f79cc35bf7bf49f391417773f83 commit r12-8488-g723c1d6284ca9f79cc35bf7bf49f391417773f83 Author: Richard Earnshaw <rearnsha@arm.com> Date: Wed Jun 15 16:07:20 2022 +0100 arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981] The code in gen_cpymem_ldrd_strd has been incorrect for big-endian since r230663. The problem is that we use gen_lowpart, etc. to split the 64-bit quantity, but fail to account for the fact that these routines are really dealing with 64-bit /values/ and in big-endian the ordering of the sub-registers changes. To fix this, I've renamed the conceptually misnamed low_reg and hi_reg as first_reg and second_reg, and then used different logic for big-endian targets to initialize these values. This makes the logic clearer than trying to think about high bits and low bits. gcc/ChangeLog: PR target/105981 * config/arm/arm.cc (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg to first_reg and second_reg respectively. Initialize them correctly when generating big-endian code. (cherry picked from commit 8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3)
The releases/gcc-11 branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>: https://gcc.gnu.org/g:2a9c9ed732d6a7f4565a286bdeb953f08fbe8fb9 commit r11-10074-g2a9c9ed732d6a7f4565a286bdeb953f08fbe8fb9 Author: Richard Earnshaw <rearnsha@arm.com> Date: Wed Jun 15 16:07:20 2022 +0100 arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981] The code in gen_cpymem_ldrd_strd has been incorrect for big-endian since r230663. The problem is that we use gen_lowpart, etc. to split the 64-bit quantity, but fail to account for the fact that these routines are really dealing with 64-bit /values/ and in big-endian the ordering of the sub-registers changes. To fix this, I've renamed the conceptually misnamed low_reg and hi_reg as first_reg and second_reg, and then used different logic for big-endian targets to initialize these values. This makes the logic clearer than trying to think about high bits and low bits. gcc/ChangeLog: PR target/105981 * config/arm/arm.c (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg to first_reg and second_reg respectively. Initialize them correctly when generating big-endian code. (cherry picked from commit 8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3)
The releases/gcc-10 branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>: https://gcc.gnu.org/g:20ece449e06f2d0dd0c50db0203d13b4e4ff2d96 commit r10-10850-g20ece449e06f2d0dd0c50db0203d13b4e4ff2d96 Author: Richard Earnshaw <rearnsha@arm.com> Date: Wed Jun 15 16:07:20 2022 +0100 arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981] The code in gen_cpymem_ldrd_strd has been incorrect for big-endian since r230663. The problem is that we use gen_lowpart, etc. to split the 64-bit quantity, but fail to account for the fact that these routines are really dealing with 64-bit /values/ and in big-endian the ordering of the sub-registers changes. To fix this, I've renamed the conceptually misnamed low_reg and hi_reg as first_reg and second_reg, and then used different logic for big-endian targets to initialize these values. This makes the logic clearer than trying to think about high bits and low bits. gcc/ChangeLog: PR target/105981 * config/arm/arm.c (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg to first_reg and second_reg respectively. Initialize them correctly when generating big-endian code. (cherry picked from commit 8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3)
Fixed on all active release branches.