This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, ARM] Subregs of VFP registers in big-endian mode
- From: Andrew Pinski <pinskia at gmail dot com>
- To: Julian Brown <julian at codesourcery dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Richard Earnshaw <rearnsha at arm dot com>, Ramana Radhakrishnan <ramrad01 at arm dot com>
- Date: Sat, 20 Oct 2012 16:41:54 -0700
- Subject: Re: [PATCH, ARM] Subregs of VFP registers in big-endian mode
- References: <20121020123824.4e9251b5@octopus>
On Sat, Oct 20, 2012 at 4:38 AM, Julian Brown <julian@codesourcery.com> wrote:
> Hi,
>
> Quite a few tests fail for big-endian multilibs which use VFP
> instructions at present. One reason for many of these is glaringly
> obvious once you notice it: for D registers interpreted as two S
> registers, the lower-numbered register is always the less-significant
> part of the value, and the higher-numbered register the
> more-significant -- regardless of the endianness the processor is
> running in.
>
> However, for big-endian mode, when DFmode values are represented in
> memory (or indeed core registers), the opposite is true. So, a subreg
> expression such as the following will work fine on core registers (or
> e.g. pseudos assigned to stack slots):
>
> (subreg:SI (reg:DF) 0)
>
> but, when applied to a VFP register Dn, it should be resolved to the
> hard register S(n*2+1). At present though, it resolves to S(n*2) -- i.e.
> the wrong half of the value (for WORDS_BIG_ENDIAN, such a subreg should
> be the most-significant part of the value). For the relatively few cases
> where DFmode values are interpreted as a pair of (integer) words, this
> means that wrong code is generated.
>
> My feeling is that implementing a "proper" solution to this problem is
> probably impractical -- the closest existing macros to control
> behaviour aren't sufficient for this case:
>
> * FLOAT_WORDS_BIG_ENDIAN only refers to memory layout, which is correct
> as is it.
>
> * REG_WORDS_BIG_ENDIAN controls whether values are stored in big-endian
> order in registers, but refers to *all* registers. We only want to
> change the behaviour for the VFP registers. Defining a new macro
> FLOAT_REG_WORDS_BIG_ENDIAN wouldn't do, because the behaviour would
> differ depending on the hard register under observation: that seems
> like too much to ask of generic machinery in the middle-end.
>
> So, the attached patch just avoids the problem, by pretending that
> greater-than-word-size values in VFP registers, in big-endian mode, are
> opaque and cannot be subreg'ed. In practice, for at least the test case
> I looked at, this isn't as much of a pessimisation as you might expect
> -- the value in question might already be stored in core registers
> (e.g. for function arguments with -mfloat-abi=softfp), so can be
> retrieved directly from those rather than via memory.
>
> This is the testsuite delta for current FSF mainline, with multilibs
> adjusted to build for little/big-endian, and using options
> "-mbig-endian -mfloat-abi=softfp -mfpu=vfpv3" for testing:
>
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O1 execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O2 execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O3 -fomit-frame-pointer execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -O3 -g execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C -Os execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/copysign1.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/mzero6.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr35456.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O1
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O2
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O3 -fomit-frame-pointer
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -O3 -g
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -Og -g
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c execution, -Os
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/compat/scalar-by-value-3 c_compat_x_tst.o-c_compat_y_tst.o execute
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O1 execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O2 execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O3 -fomit-frame-pointer execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -O3 -g execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c -Os execution test
>
> OK for mainline, or any comments? (I've included the multilib tweaks I
> used in the attached patch for reference, though I'm not proposing to
> apply those.)
I also tested this on GCC 4.7.0 with armeb-linux-gnueabi defaulting to
hardfloat ABI and fixes a lot of failures there too.
Thanks,
Andrew Pinski
>
> Thanks,
>
> Julian
>
> ChangeLog
>
> gcc/
> * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Avoid subreg'ing
> VFP D registers in big-endian mode.