This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH][ARM][RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus

From: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
To: GCC Patches <gcc-patches at gcc dot gnu dot org>
Cc: Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>
Date: Wed, 02 Mar 2016 13:46:08 +0000
Subject: Re: [PATCH][ARM][RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus
Authentication-results: sourceware.org; auth=none
References: <56BA2014 dot 1020708 at foss dot arm dot com> <56BA20CF dot 5090108 at foss dot arm dot com> <56C4479D dot 8010101 at foss dot arm dot com> <56CDB49F dot 40109 at foss dot arm dot com>

Ping*3.

Thanks,
Kyrill
On 24/02/16 13:48, Kyrill Tkachov wrote:

Ping*2

Thanks,
Kyrill

On 17/02/16 10:12, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00634.html

As mentioned before, this is actually a fix for PR target/69538.
I got confused when writing the cover letter and ChangeLog...

Thanks,
Kyrill

On 09/02/16 17:24, Kyrill Tkachov wrote:


On 09/02/16 17:21, Kyrill Tkachov wrote:

Hi all,

In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when targeting an fpu
with only single-precision capabilities.

bar is a function returing a double. For non-LTO compilation the caller of bar reads the return value
from it from the s0 and s1 VFP registers like expected, but for -flto the caller seems to expect the
return value from the r0 and r1 regs.  The RTL dumps show that too.

Debugging the calls to arm_function_value show that in the -flto compilation the function bar is deemed
to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, whereas for the non-LTO (and non-breaking)
compilation it uses the ARM_PCS_AAPCS_VFP variant.

Further down in use_vfp_abi when deciding whether to use VFP registers for the result there is a bit of
logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant with a double precision value
on an FPU that is not TARGET_VFP_DOUBLE.

This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that the function doesn't escape
the translation unit and we can thus use whatever variant we want. From what I understand we want to use the
VFP regs when possible for FP values.

So this patch removes that restriction and for the testcase the caller of bar correctly reads the return
value of bar from the VFP registers and everything works.

This patch has been bootstrapped and tested on arm-none-linux-gnueabihf configured with --with-fpu=fpv4-sp-d16.
The bootstrapped was performed with LTO.
I didn't see any regressions.

It seems that this logic was put there in 2009 with r154034 as part of a large patch to enable support for half-precision
floating point.

I'm not very familiar with this part of the code, so is this a safe patch to do?
The patch should only ever change behaviour for single-precision-only fpus and only for static functions
that don't get called outside their translation units (or during LTO I suppose) so there shouldn't
be any ABI problems, I think.

Is this ok for trunk?

Thanks,
Kyrill


Huh, I just realised I wrote completely the wrong PR number on this.
The PR I'm talking about here is PR target/69538

Sorry for the confusion.

Kyrill

2016-02-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>

    PR target/65578
    * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
    Don't check for is_double and TARGET_VFP_DOUBLE.
    (aapcs_vfp_is_call_or_return_candidate): Update callsite.
    (aapcs_vfp_is_return_candidate): Likewise.
    (aapcs_vfp_is_call_candidate): Likewise.
    (aapcs_vfp_allocate_return_reg): Likewise.

Follow-Ups:
- Re: [PATCH][ARM][RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus
  - From: Kyrill Tkachov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]