This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, ARM] Keep constants in register when expanding
- From: Zhenqiang Chen <zhenqiang dot chen at linaro dot org>
- To: Ramana Radhakrishnan <ramrad01 at arm dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 12 Aug 2014 13:37:17 +0800
- Subject: Re: [PATCH, ARM] Keep constants in register when expanding
- Authentication-results: sourceware.org; auth=none
- References: <CACgzC7Bx7fSLmYysz=09D4yi+_QcfLo4BXK3gEGa8kmywE0_JQ at mail dot gmail dot com> <CAJA7tRaHUngFY+=t8C0jaQmt-5uxFu4fEkAsj7u26Vp75x9PfA at mail dot gmail dot com> <CACgzC7Bsp96OPM_W43-VW8497Dd57Z5zuBVTA0ifeXkMQCYgbw at mail dot gmail dot com> <CAJA7tRa2T2GYNsPRWdd6+u6chK4MMp_XOoNJfkH3emxSHfN_WQ at mail dot gmail dot com>
On 11 August 2014 19:14, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> On Mon, Aug 11, 2014 at 3:35 AM, Zhenqiang Chen
> <zhenqiang.chen@linaro.org> wrote:
>> On 8 August 2014 23:22, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>>> On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen
>>> <zhenqiang.chen@linaro.org> wrote:
>>>> Hi,
>>>>
>>>> For some large constants, ARM will split them during expanding, which
>>>> makes impossible to hoist them out the loop or shared by different
>>>> references (refer the test case in the patch).
>>>>
>>>> The patch keeps some constants in registers. If the constant can not
>>>> be optimized, the cprop and combine passes can optimize them as what
>>>> we do in current expand pass with
>>>>
>>>> define_insn_and_split "*arm_subsi3_insn"
>>>> define_insn_and_split "*arm_andsi3_insn"
>>>> define_insn_and_split "*iorsi3_insn"
>>>> define_insn_and_split "*arm_xorsi3"
>>>>
>>>> The patch does not modify addsi3 since the define_insn_and_split
>>>> "*arm_addsi3" is only valid when (reload_completed ||
>>>> !arm_eliminable_register (operands[1])). The cprop and combine passes
>>>> can not optimize the large constant if we put it in register, which
>>>> will lead to regression.
>>>>
>>>> For logic operators, the patch skips changes for constants:
>>>>
>>>> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2])
>>>>
>>>> since expand pass always uses "sign-extend" to get the value
>>>> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and
>>>> logs show most negative values are UNSIGNED when they are TREE node.
>>>> And combine pass is smart enough to recover the negative value to
>>>> positive value.
>>>
>>> I am unable to verify any change in code generation for this testcase
>>> with and without the patch when I had a play with the patch.
>>>
>>> what gives ?
>>
>> Thanks for trying the patch.
>>
>> Do you add option -fno-gcse which is mentioned in dg-options " -O2
>> -fno-gcse "? Without it, there is no change for the testcase since
>> cprop pass will propagate the constant to AND expr (A patch to enhance
>> cprop pass was discussed at
>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01321.html).
>
> Probably not and I can now see the difference in code generated for
> Thumb state. Why is it that in ARM state with -mcpu=cortex-a15 we see
> the hoisting of the constant without your patch with -fno-gcse ?
The difference between ARM and THUMB2 modes are due to rtx_cost
difference. For ARM mode, the constant is force_reg in function
avoid_expensive_constant (obtabs.c) before gen_andsi3 when expanding.
> So, the patch improves code generation for -mcpu=cortex-a15 -mthumb
> -fno-gcse for the given testcase ?
Yes.
>>
>> In addition, if the constant can not be hoisted out the loop, later
>> combine pass can also optimize it. These (cprop and combine) are
>> reasons why the patch itself has little impact on current tests.
>
> Does this mean you need the referred to patch to be useful as a
> pre-requisite ? I fail to understand why this patch needs to go in if
> it makes no difference without disabling GCSE. I cannot see -fno-gcse
> being used by default for performant code.
For some codes, -fno-gcse might get better performance. Please refer paper:
A case study: optimizing GCC on ARM for performance of libevas
rasterization library
http://ctuning.org/dissemination/grow10-03.pdf
The issues mentioned in the paper had been solved since
arm_split_constant is smart enough to handle the 0xff00ff. But what
for other irregular constant?
The patch gives a chance to handle them.
Thanks!
-Zhenqiang
> regards
> Ramana