105981 – [10/11/12 regression] Wrong code from gen_cpymem_ldrd_strd() with -mbig-endian

Bug 105981 - [10/11/12 regression] Wrong code from gen_cpymem_ldrd_strd() with -mbig-endian

Summary: [10/11/12 regression] Wrong code from gen_cpymem_ldrd_strd() with -mbig-endian

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	12.1.0

Importance:	P3 normal
Target Milestone:	10.4
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2022-06-14 19:21 UTC by Gonzalo
Modified:	2022-06-16 12:51 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	armeb-none-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed:	2022-06-15 00:00:00

Attachments
files with the source and results for cortex-a72 and cortex-a53 (804 bytes, application/x-tar) 2022-06-14 19:21 UTC, Gonzalo	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Gonzalo 2022-06-14 19:21:28 UTC

Created attachment 53137 [details]
files with the source and results for cortex-a72 and cortex-a53

When compiling a file("test.c") with a simple function like this:
void test(char *buf)
{
    __builtin_strcpy(buf, "abcd1234");
}

With this:
../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc -c -Wall -Wextra -O3 -mbig-endian -ffreestanding -fbuiltin -nostdinc -save-temps -mcpu=cortex-a72 test.c -o test.a72

The code generated is:
.LC0:
        .ascii  "abcd1234\000"
        .text
        .align  2
        .global test
        .syntax unified
        .arm
        .type   test, %function
test:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        movw    r3, #:lower16:.LC0
        movt    r3, #:upper16:.LC0
        strd    r4, [sp, #-8]!
        ldrd    r4, [r3]
        ldrb    r3, [r3, #8]    @ zero_extendqisi2
        str     r5, [r0]        @ unaligned
        str     r4, [r0, #4]    @ unaligned
        ldrd    r4, [sp]
        add     sp, sp, #8
        strb    r3, [r0, #8]
        bx      lr


String read:
In "ldrd    r4, [r3]" It reads the string "abcd1234" into r4 and r5. (I am skipping the null string terminator. It is written ok).


Buffer fill:
Then, it writes r5 into the start of the buffer "str     r5, [r0]"
And then r4 in the offset 4 of the buffer "str     r4, [r0, #4]".

The result is that we get the buffer with the string 32bits-swapped "1234abcd".

*** WRONG RESULT ***






But, if we compile the same file with -mcpu=cortex-a53 it works fine:
.LC0:
        .ascii  "abcd1234\000"
        .text
        .align  2
        .global test
        .syntax unified
        .arm
        .type   test, %function
test:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        movw    r3, #:lower16:.LC0
        movt    r3, #:upper16:.LC0
        mov     r2, r0
        ldmia   r3!, {r0, r1}
        str     r0, [r2]        @ unaligned
        str     r1, [r2, #4]    @ unaligned
        ldrb    r3, [r3]        @ zero_extendqisi2
        strb    r3, [r2, #8]
        bx      lr

String read:
        ldmia   r3!, {r0, r1}
Buffer fill:
        str     r0, [r2]        @ unaligned
        str     r1, [r2, #4]    @ unaligned

*** CORRECT ORDER ***




The gcc release:
git log -1
commit 1ea978e3066ac565a1ec28a96a4d61eaf38e2726 (HEAD, tag: releases/gcc-12.1.0)
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri May 6 07:07:53 2022 +0000

    Update ChangeLog and version files for release


git remote -v
origin  git://gcc.gnu.org/git/gcc.git (fetch)
origin  git://gcc.gnu.org/git/gcc.git (push)




The output when I add the "-v" option:

../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc -v -c -Wall -Wextra -O3 -mbig-endian -ffreestanding -fbuiltin -nostdinc -save-temps -mcpu=cortex-a72 test.c -o test.a72
Using built-in specs.
COLLECT_GCC=../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc
Target: armeb-none-eabi
Configured with: /home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/src/gcc/configure --build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu --target=armeb-none-eabi --prefix=/home/gjimenez/x-tools/armeb-none-eabi --exec_prefix=/home/gjimenez/x-tools/armeb-none-eabi --with-local-prefix=/home/gjimenez/x-tools/armeb-none-eabi/armeb-none-eabi --with-headers=/home/gjimenez/x-tools/armeb-none-eabi/armeb-none-eabi/include --with-newlib --enable-threads=no --disable-shared --with-pkgversion='crosstool-NG 1.25.0.36_a3e3d73' --enable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-libmpx --disable-libssp --disable-libquadmath --disable-libquadmath-support --disable-libstdcxx-verbose --with-gmp=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --with-mpfr=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --with-mpc=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --with-isl=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --disable-lto --enable-target-optspace --disable-nls --enable-multiarch --with-multilib-list='rmprofile aprofile' --enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 12.1.0 (crosstool-NG 1.25.0.36_a3e3d73)
COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc'
 /home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/cc1 -E -quiet -nostdinc -v -imultilib thumb/v8-a/nofp -D__USES_INITFINI__ test.c -mbig-endian -mcpu=cortex-a72 -mfloat-abi=soft -marm -mlibarch=armv8-a+crc -march=armv8-a+crc -Wall -Wextra -ffreestanding -fbuiltin -O3 -fpch-preprocess -o test.i
#include "..." search starts here:
#include <...> search starts here:
End of search list.
COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc'
 /home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/cc1 -fpreprocessed test.i -quiet -dumpbase test.c -dumpbase-ext .c -mbig-endian -mcpu=cortex-a72 -mfloat-abi=soft -marm -mlibarch=armv8-a+crc -march=armv8-a+crc -O3 -Wall -Wextra -version -ffreestanding -fbuiltin -o test.s
GNU C17 (crosstool-NG 1.25.0.36_a3e3d73) version 12.1.0 (armeb-none-eabi)
        compiled by GNU C version 9.4.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C17 (crosstool-NG 1.25.0.36_a3e3d73) version 12.1.0 (armeb-none-eabi)
        compiled by GNU C version 9.4.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 3261fde4c69fe46364115246804cccd4
COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc'
 /home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/bin/as -EB -march=armv8-a+crc -mfloat-abi=soft -meabi=5 -o test.a72 test.s
COMPILER_PATH=/home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/bin/
LIBRARY_PATH=/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/thumb/v8-a/nofp/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/lib/thumb/v8-a/nofp/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc' '-dumpdir' 'test.'

Comment 1 Richard Earnshaw 2022-06-15 13:27:48 UTC

Confirmed.  Looking at the sources, I think this has been broken since r230663.

Comment 2 GCC Commits 2022-06-15 15:09:17 UTC

The master branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:

https://gcc.gnu.org/g:8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3

commit r13-1110-g8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Wed Jun 15 16:07:20 2022 +0100

    arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981]
    
    The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
    since r230663.  The problem is that we use gen_lowpart, etc. to split
    the 64-bit quantity, but fail to account for the fact that these
    routines are really dealing with 64-bit /values/ and in big-endian the
    ordering of the sub-registers changes.
    
    To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
    as first_reg and second_reg, and then used different logic for
    big-endian targets to initialize these values.  This makes the logic
    clearer than trying to think about high bits and low bits.
    
    gcc/ChangeLog:
    
            PR target/105981
            * config/arm/arm.cc (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
            to first_reg and second_reg respectively.  Initialize them correctly
            when generating big-endian code.

Comment 3 Richard Earnshaw 2022-06-15 15:10:44 UTC

Fixed on trunk so far.

Comment 4 GCC Commits 2022-06-16 12:46:34 UTC

The releases/gcc-12 branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:

https://gcc.gnu.org/g:723c1d6284ca9f79cc35bf7bf49f391417773f83

commit r12-8488-g723c1d6284ca9f79cc35bf7bf49f391417773f83
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Wed Jun 15 16:07:20 2022 +0100

    arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981]
    
    The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
    since r230663.  The problem is that we use gen_lowpart, etc. to split
    the 64-bit quantity, but fail to account for the fact that these
    routines are really dealing with 64-bit /values/ and in big-endian the
    ordering of the sub-registers changes.
    
    To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
    as first_reg and second_reg, and then used different logic for
    big-endian targets to initialize these values.  This makes the logic
    clearer than trying to think about high bits and low bits.
    
    gcc/ChangeLog:
    
            PR target/105981
            * config/arm/arm.cc (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
            to first_reg and second_reg respectively.  Initialize them correctly
            when generating big-endian code.
    
    (cherry picked from commit 8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3)

Comment 5 GCC Commits 2022-06-16 12:46:58 UTC

The releases/gcc-11 branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:

https://gcc.gnu.org/g:2a9c9ed732d6a7f4565a286bdeb953f08fbe8fb9

commit r11-10074-g2a9c9ed732d6a7f4565a286bdeb953f08fbe8fb9
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Wed Jun 15 16:07:20 2022 +0100

    arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981]
    
    The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
    since r230663.  The problem is that we use gen_lowpart, etc. to split
    the 64-bit quantity, but fail to account for the fact that these
    routines are really dealing with 64-bit /values/ and in big-endian the
    ordering of the sub-registers changes.
    
    To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
    as first_reg and second_reg, and then used different logic for
    big-endian targets to initialize these values.  This makes the logic
    clearer than trying to think about high bits and low bits.
    
    gcc/ChangeLog:
    
            PR target/105981
            * config/arm/arm.c (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
            to first_reg and second_reg respectively.  Initialize them correctly
            when generating big-endian code.
    
    (cherry picked from commit 8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3)

Comment 6 GCC Commits 2022-06-16 12:47:20 UTC

The releases/gcc-10 branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:

https://gcc.gnu.org/g:20ece449e06f2d0dd0c50db0203d13b4e4ff2d96

commit r10-10850-g20ece449e06f2d0dd0c50db0203d13b4e4ff2d96
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Wed Jun 15 16:07:20 2022 +0100

    arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981]
    
    The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
    since r230663.  The problem is that we use gen_lowpart, etc. to split
    the 64-bit quantity, but fail to account for the fact that these
    routines are really dealing with 64-bit /values/ and in big-endian the
    ordering of the sub-registers changes.
    
    To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
    as first_reg and second_reg, and then used different logic for
    big-endian targets to initialize these values.  This makes the logic
    clearer than trying to think about high bits and low bits.
    
    gcc/ChangeLog:
    
            PR target/105981
            * config/arm/arm.c (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
            to first_reg and second_reg respectively.  Initialize them correctly
            when generating big-endian code.
    
    (cherry picked from commit 8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3)

Comment 7 Richard Earnshaw 2022-06-16 12:51:34 UTC

Fixed on all active release branches.