[PATCH][ARM][RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus

Wed Feb 24 13:48:00 GMT 2016

Ping*2

Thanks,
Kyrill

On 17/02/16 10:12, Kyrill Tkachov wrote:
> Ping.
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00634.html
>
> As mentioned before, this is actually a fix for PR target/69538.
> I got confused when writing the cover letter and ChangeLog...
>
> Thanks,
> Kyrill
>
> On 09/02/16 17:24, Kyrill Tkachov wrote:
>>
>> On 09/02/16 17:21, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> In this wrong-code PR the builtin-apply-4.c test fails with -flto but only when targeting an fpu
>>> with only single-precision capabilities.
>>>
>>> bar is a function returing a double. For non-LTO compilation the caller of bar reads the return value
>>> from it from the s0 and s1 VFP registers like expected, but for -flto the caller seems to expect the
>>> return value from the r0 and r1 regs.  The RTL dumps show that too.
>>>
>>> Debugging the calls to arm_function_value show that in the -flto compilation the function bar is deemed
>>> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS variant, whereas for the non-LTO (and non-breaking)
>>> compilation it uses the ARM_PCS_AAPCS_VFP variant.
>>>
>>> Further down in use_vfp_abi when deciding whether to use VFP registers for the result there is a bit of
>>> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL variant with a double precision value
>>> on an FPU that is not TARGET_VFP_DOUBLE.
>>>
>>> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means that the function doesn't escape
>>> the translation unit and we can thus use whatever variant we want. From what I understand we want to use the
>>> VFP regs when possible for FP values.
>>>
>>> So this patch removes that restriction and for the testcase the caller of bar correctly reads the return
>>> value of bar from the VFP registers and everything works.
>>>
>>> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf configured with --with-fpu=fpv4-sp-d16.
>>> The bootstrapped was performed with LTO.
>>> I didn't see any regressions.
>>>
>>> It seems that this logic was put there in 2009 with r154034 as part of a large patch to enable support for half-precision
>>> floating point.
>>>
>>> I'm not very familiar with this part of the code, so is this a safe patch to do?
>>> The patch should only ever change behaviour for single-precision-only fpus and only for static functions
>>> that don't get called outside their translation units (or during LTO I suppose) so there shouldn't
>>> be any ABI problems, I think.
>>>
>>> Is this ok for trunk?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>
>> Huh, I just realised I wrote completely the wrong PR number on this.
>> The PR I'm talking about here is PR target/69538
>>
>> Sorry for the confusion.
>>
>> Kyrill
>>
>>
>>> 2016-02-09 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>
>>>     PR target/65578
>>>     * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
>>>     Don't check for is_double and TARGET_VFP_DOUBLE.
>>>     (aapcs_vfp_is_call_or_return_candidate): Update callsite.
>>>     (aapcs_vfp_is_return_candidate): Likewise.
>>>     (aapcs_vfp_is_call_candidate): Likewise.
>>>     (aapcs_vfp_allocate_return_reg): Likewise.
>>
>