Bug 105981 - [10/11/12 regression] Wrong code from gen_cpymem_ldrd_strd() with -mbig-endian
Summary: [10/11/12 regression] Wrong code from gen_cpymem_ldrd_strd() with -mbig-endian
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 12.1.0
: P3 normal
Target Milestone: 10.4
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2022-06-14 19:21 UTC by Gonzalo
Modified: 2022-06-16 12:51 UTC (History)
1 user (show)

See Also:
Host:
Target: armeb-none-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-06-15 00:00:00


Attachments
files with the source and results for cortex-a72 and cortex-a53 (804 bytes, application/x-tar)
2022-06-14 19:21 UTC, Gonzalo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gonzalo 2022-06-14 19:21:28 UTC
Created attachment 53137 [details]
files with the source and results for cortex-a72 and cortex-a53

When compiling a file("test.c") with a simple function like this:
void test(char *buf)
{
    __builtin_strcpy(buf, "abcd1234");
}

With this:
../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc -c -Wall -Wextra -O3 -mbig-endian -ffreestanding -fbuiltin -nostdinc -save-temps -mcpu=cortex-a72 test.c -o test.a72

The code generated is:
.LC0:
        .ascii  "abcd1234\000"
        .text
        .align  2
        .global test
        .syntax unified
        .arm
        .type   test, %function
test:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        movw    r3, #:lower16:.LC0
        movt    r3, #:upper16:.LC0
        strd    r4, [sp, #-8]!
        ldrd    r4, [r3]
        ldrb    r3, [r3, #8]    @ zero_extendqisi2
        str     r5, [r0]        @ unaligned
        str     r4, [r0, #4]    @ unaligned
        ldrd    r4, [sp]
        add     sp, sp, #8
        strb    r3, [r0, #8]
        bx      lr


String read:
In "ldrd    r4, [r3]" It reads the string "abcd1234" into r4 and r5. (I am skipping the null string terminator. It is written ok).


Buffer fill:
Then, it writes r5 into the start of the buffer "str     r5, [r0]"
And then r4 in the offset 4 of the buffer "str     r4, [r0, #4]".

The result is that we get the buffer with the string 32bits-swapped "1234abcd".

*** WRONG RESULT ***






But, if we compile the same file with -mcpu=cortex-a53 it works fine:
.LC0:
        .ascii  "abcd1234\000"
        .text
        .align  2
        .global test
        .syntax unified
        .arm
        .type   test, %function
test:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        movw    r3, #:lower16:.LC0
        movt    r3, #:upper16:.LC0
        mov     r2, r0
        ldmia   r3!, {r0, r1}
        str     r0, [r2]        @ unaligned
        str     r1, [r2, #4]    @ unaligned
        ldrb    r3, [r3]        @ zero_extendqisi2
        strb    r3, [r2, #8]
        bx      lr

String read:
        ldmia   r3!, {r0, r1}
Buffer fill:
        str     r0, [r2]        @ unaligned
        str     r1, [r2, #4]    @ unaligned

*** CORRECT ORDER ***




The gcc release:
git log -1
commit 1ea978e3066ac565a1ec28a96a4d61eaf38e2726 (HEAD, tag: releases/gcc-12.1.0)
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri May 6 07:07:53 2022 +0000

    Update ChangeLog and version files for release


git remote -v
origin  git://gcc.gnu.org/git/gcc.git (fetch)
origin  git://gcc.gnu.org/git/gcc.git (push)




The output when I add the "-v" option:

../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc -v -c -Wall -Wextra -O3 -mbig-endian -ffreestanding -fbuiltin -nostdinc -save-temps -mcpu=cortex-a72 test.c -o test.a72
Using built-in specs.
COLLECT_GCC=../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc
Target: armeb-none-eabi
Configured with: /home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/src/gcc/configure --build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu --target=armeb-none-eabi --prefix=/home/gjimenez/x-tools/armeb-none-eabi --exec_prefix=/home/gjimenez/x-tools/armeb-none-eabi --with-local-prefix=/home/gjimenez/x-tools/armeb-none-eabi/armeb-none-eabi --with-headers=/home/gjimenez/x-tools/armeb-none-eabi/armeb-none-eabi/include --with-newlib --enable-threads=no --disable-shared --with-pkgversion='crosstool-NG 1.25.0.36_a3e3d73' --enable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-libmpx --disable-libssp --disable-libquadmath --disable-libquadmath-support --disable-libstdcxx-verbose --with-gmp=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --with-mpfr=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --with-mpc=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --with-isl=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools --disable-lto --enable-target-optspace --disable-nls --enable-multiarch --with-multilib-list='rmprofile aprofile' --enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 12.1.0 (crosstool-NG 1.25.0.36_a3e3d73)
COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc'
 /home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/cc1 -E -quiet -nostdinc -v -imultilib thumb/v8-a/nofp -D__USES_INITFINI__ test.c -mbig-endian -mcpu=cortex-a72 -mfloat-abi=soft -marm -mlibarch=armv8-a+crc -march=armv8-a+crc -Wall -Wextra -ffreestanding -fbuiltin -O3 -fpch-preprocess -o test.i
#include "..." search starts here:
#include <...> search starts here:
End of search list.
COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc'
 /home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/cc1 -fpreprocessed test.i -quiet -dumpbase test.c -dumpbase-ext .c -mbig-endian -mcpu=cortex-a72 -mfloat-abi=soft -marm -mlibarch=armv8-a+crc -march=armv8-a+crc -O3 -Wall -Wextra -version -ffreestanding -fbuiltin -o test.s
GNU C17 (crosstool-NG 1.25.0.36_a3e3d73) version 12.1.0 (armeb-none-eabi)
        compiled by GNU C version 9.4.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C17 (crosstool-NG 1.25.0.36_a3e3d73) version 12.1.0 (armeb-none-eabi)
        compiled by GNU C version 9.4.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 3261fde4c69fe46364115246804cccd4
COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc'
 /home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/bin/as -EB -march=armv8-a+crc -mfloat-abi=soft -meabi=5 -o test.a72 test.s
COMPILER_PATH=/home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/libexec/gcc/armeb-none-eabi/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/bin/
LIBRARY_PATH=/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/thumb/v8-a/nofp/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/lib/thumb/v8-a/nofp/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/:/home/gjimenez/x-tools/armeb-none-eabi/lib/gcc/armeb-none-eabi/12.1.0/../../../../armeb-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian' '-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o' 'test.a72' '-mfloat-abi=soft' '-marm' '-mlibarch=armv8-a+crc' '-march=armv8-a+crc' '-dumpdir' 'test.'
Comment 1 Richard Earnshaw 2022-06-15 13:27:48 UTC
Confirmed.  Looking at the sources, I think this has been broken since r230663.
Comment 2 GCC Commits 2022-06-15 15:09:17 UTC
The master branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:

https://gcc.gnu.org/g:8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3

commit r13-1110-g8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Wed Jun 15 16:07:20 2022 +0100

    arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981]
    
    The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
    since r230663.  The problem is that we use gen_lowpart, etc. to split
    the 64-bit quantity, but fail to account for the fact that these
    routines are really dealing with 64-bit /values/ and in big-endian the
    ordering of the sub-registers changes.
    
    To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
    as first_reg and second_reg, and then used different logic for
    big-endian targets to initialize these values.  This makes the logic
    clearer than trying to think about high bits and low bits.
    
    gcc/ChangeLog:
    
            PR target/105981
            * config/arm/arm.cc (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
            to first_reg and second_reg respectively.  Initialize them correctly
            when generating big-endian code.
Comment 3 Richard Earnshaw 2022-06-15 15:10:44 UTC
Fixed on trunk so far.
Comment 4 GCC Commits 2022-06-16 12:46:34 UTC
The releases/gcc-12 branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:

https://gcc.gnu.org/g:723c1d6284ca9f79cc35bf7bf49f391417773f83

commit r12-8488-g723c1d6284ca9f79cc35bf7bf49f391417773f83
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Wed Jun 15 16:07:20 2022 +0100

    arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981]
    
    The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
    since r230663.  The problem is that we use gen_lowpart, etc. to split
    the 64-bit quantity, but fail to account for the fact that these
    routines are really dealing with 64-bit /values/ and in big-endian the
    ordering of the sub-registers changes.
    
    To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
    as first_reg and second_reg, and then used different logic for
    big-endian targets to initialize these values.  This makes the logic
    clearer than trying to think about high bits and low bits.
    
    gcc/ChangeLog:
    
            PR target/105981
            * config/arm/arm.cc (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
            to first_reg and second_reg respectively.  Initialize them correctly
            when generating big-endian code.
    
    (cherry picked from commit 8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3)
Comment 5 GCC Commits 2022-06-16 12:46:58 UTC
The releases/gcc-11 branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:

https://gcc.gnu.org/g:2a9c9ed732d6a7f4565a286bdeb953f08fbe8fb9

commit r11-10074-g2a9c9ed732d6a7f4565a286bdeb953f08fbe8fb9
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Wed Jun 15 16:07:20 2022 +0100

    arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981]
    
    The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
    since r230663.  The problem is that we use gen_lowpart, etc. to split
    the 64-bit quantity, but fail to account for the fact that these
    routines are really dealing with 64-bit /values/ and in big-endian the
    ordering of the sub-registers changes.
    
    To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
    as first_reg and second_reg, and then used different logic for
    big-endian targets to initialize these values.  This makes the logic
    clearer than trying to think about high bits and low bits.
    
    gcc/ChangeLog:
    
            PR target/105981
            * config/arm/arm.c (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
            to first_reg and second_reg respectively.  Initialize them correctly
            when generating big-endian code.
    
    (cherry picked from commit 8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3)
Comment 6 GCC Commits 2022-06-16 12:47:20 UTC
The releases/gcc-10 branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:

https://gcc.gnu.org/g:20ece449e06f2d0dd0c50db0203d13b4e4ff2d96

commit r10-10850-g20ece449e06f2d0dd0c50db0203d13b4e4ff2d96
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Wed Jun 15 16:07:20 2022 +0100

    arm: big-endian issue in gen_cpymem_ldrd_strd [PR105981]
    
    The code in gen_cpymem_ldrd_strd has been incorrect for big-endian
    since r230663.  The problem is that we use gen_lowpart, etc. to split
    the 64-bit quantity, but fail to account for the fact that these
    routines are really dealing with 64-bit /values/ and in big-endian the
    ordering of the sub-registers changes.
    
    To fix this, I've renamed the conceptually misnamed low_reg and hi_reg
    as first_reg and second_reg, and then used different logic for
    big-endian targets to initialize these values.  This makes the logic
    clearer than trying to think about high bits and low bits.
    
    gcc/ChangeLog:
    
            PR target/105981
            * config/arm/arm.c (gen_cpymem_ldrd_strd): Rename low_reg and hi_reg
            to first_reg and second_reg respectively.  Initialize them correctly
            when generating big-endian code.
    
    (cherry picked from commit 8aaa948059a8b5f0a62ad010d0aa6346b7ac9cd3)
Comment 7 Richard Earnshaw 2022-06-16 12:51:34 UTC
Fixed on all active release branches.