This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, cprop] Check rtx_cost when propagating constant
- From: Zhenqiang Chen <zhenqiang dot chen at linaro dot org>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Andrew Pinski <pinskia at gmail dot com>
- Date: Tue, 17 Jun 2014 17:42:27 +0800
- Subject: Re: [PATCH, cprop] Check rtx_cost when propagating constant
- Authentication-results: sourceware.org; auth=none
- References: <CACgzC7A8cVyoyFQO24eOo9t0+UEuKcYtWdANiJvgukcg_P9baw at mail dot gmail dot com> <CAFiYyc3=-0aeeGuMiQepEBFhiJqxxLKD-WZ2RPRJA95PTdJ_EA at mail dot gmail dot com>
On 17 June 2014 16:15, Richard Biener <richard.guenther@gmail.com> wrote:
> On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen
> <zhenqiang.chen@linaro.org> wrote:
>> Hi,
>>
>> For some large constant, ports like ARM, need one more instructions to
>> operate it. e.g
>>
>> #define MASK 0xfe00ff
>> void maskdata (int * data, int len)
>> {
>> int i = len;
>> for (; i > 0; i -= 2)
>> {
>> data[i] &= MASK;
>> data[i + 1] &= MASK;
>> }
>> }
>>
>> Need two instructions for each AND operation:
>>
>> and r3, r3, #16711935
>> bic r3, r3, #65536
>>
>> If we keep the MASK in a register, loop2_invariant pass can hoist it
>> out the loop. And it can be shared by different references.
>>
>> So the patch skips constant propagation if it makes INSN's cost higher.
>
> So cprop undos invariant motions work here?
Yes. GLOBAL CONST-PROP will undo invariant motions.
> Should we make sure we add a REG_EQUAL note when not propagating?
Logs show there already has REG_EQUAL note.
>> Bootstrap and no make check regression on X86-64 and ARM Chrome book.
>>
>> OK for trunk?
>>
>> Thanks!
>> -Zhenqiang
>>
>> ChangeLog:
>> 2014-06-17 Zhenqiang Chen <zhenqiang.chen@linaro.org>
>>
>> * cprop.c (try_replace_reg): Check cost for constants.
>>
>> diff --git a/gcc/cprop.c b/gcc/cprop.c
>> index aef3ee8..c9cf02a 100644
>> --- a/gcc/cprop.c
>> +++ b/gcc/cprop.c
>> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>> rtx src = 0;
>> int success = 0;
>> rtx set = single_set (insn);
>> + int old_cost = 0;
>> + bool copy_p = false;
>> + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
>> +
>> + if (set && SET_SRC (set) && REG_P (SET_SRC (set)))
>> + copy_p = true;
>> + else
>> + old_cost = set_rtx_cost (set, speed);
>
> Looks bogus for set == NULL?
set_rtx_cost has checked it. If it is NULL, the function will return 0;
> Also what about register pressure?
Do you think it has big register pressure impact? I think it does not
increase register pressure.
> I think this kind of change needs wider testing as RTX costs are
> usually not fully implemented and you introduce a new use kind
> (or is it already used elsewhere in this way to compute cost
> difference of a set with s/reg/const?).
Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the
decision. e.g. in function attempt_change of auto-inc-dec.c, it has
code segments like:
old_cost = (set_src_cost (mem, speed)
+ set_rtx_cost (PATTERN (inc_insn.insn), speed));
new_cost = set_src_cost (mem_tmp, speed);
...
if (old_cost < new_cost)
{
...
return false;
}
The usage of RTX costs in this patch is similar.
I had run X86-64 bootstrap and regression tests with
--enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java
And ARM bootstrap and regression tests with
--enable-languages=c,c++,fortran,lto,objc,obj-c++
I will run tests on i686. What other tests do you think I have to run?
> What kind of performance difference do you see?
I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm
backend changes). Coremark with some options show >10% performance
improvement. dhrystone is a little better. Some wave in eembc, but
overall result is better.
I will run spec2000 on X86-64 and ARM, and back to you about the
performance changes.
Thanks!
-Zhenqiang
> Thanks,
> Richard.
>
>> /* Usually we substitute easy stuff, so we won't copy everything.
>> We however need to take care to not duplicate non-trivial CONST
>> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>> to = copy_rtx (to);
>>
>> validate_replace_src_group (from, to, insn);
>> +
>> + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
>> + And it can be shared by different references. So skip propagation if
>> + it makes INSN's rtx cost higher. */
>> + if (set && !copy_p && CONSTANT_P (to))
>> + {
>> + int new_cost = set_rtx_cost (set, speed);
>> + if (new_cost > old_cost)
>> + {
>> + cancel_changes (0);
>> + return false;
>> + }
>> + }
>> +
>> if (num_changes_pending () && apply_change_group ())
>> success = 1;