This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, cprop] Check rtx_cost when propagating constant


On 17 June 2014 16:15, Richard Biener <richard.guenther@gmail.com> wrote:
> On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen
> <zhenqiang.chen@linaro.org> wrote:
>> Hi,
>>
>> For some large constant, ports like ARM, need one more instructions to
>> operate it. e.g
>>
>> #define MASK 0xfe00ff
>> void maskdata (int * data, int len)
>> {
>>    int i = len;
>>    for (; i > 0; i -= 2)
>>     {
>>       data[i] &= MASK;
>>       data[i + 1] &= MASK;
>>     }
>> }
>>
>> Need two instructions for each AND operation:
>>
>>     and    r3, r3, #16711935
>>     bic    r3, r3, #65536
>>
>> If we keep the MASK in a register, loop2_invariant pass can hoist it
>> out the loop. And it can be shared by different references.
>>
>> So the patch skips constant propagation if it makes INSN's cost higher.
>
> So cprop undos invariant motions work here?

Yes. GLOBAL CONST-PROP will undo invariant motions.

> Should we make sure we add a REG_EQUAL note when not propagating?

Logs show there already has REG_EQUAL note.

>> Bootstrap and no make check regression on X86-64 and ARM Chrome book.
>>
>> OK for trunk?
>>
>> Thanks!
>> -Zhenqiang
>>
>> ChangeLog:
>> 2014-06-17  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>>
>>         * cprop.c (try_replace_reg): Check cost for constants.
>>
>> diff --git a/gcc/cprop.c b/gcc/cprop.c
>> index aef3ee8..c9cf02a 100644
>> --- a/gcc/cprop.c
>> +++ b/gcc/cprop.c
>> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>>    rtx src = 0;
>>    int success = 0;
>>    rtx set = single_set (insn);
>> +  int old_cost = 0;
>> +  bool copy_p = false;
>> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
>> +
>> +  if (set && SET_SRC (set) && REG_P (SET_SRC (set)))
>> +    copy_p = true;
>> +  else
>> +    old_cost = set_rtx_cost (set, speed);
>
> Looks bogus for set == NULL?

set_rtx_cost has checked it. If it is NULL, the function will return 0;

> Also what about register pressure?

Do you think it has big register pressure impact? I think it does not
increase register pressure.

> I think this kind of change needs wider testing as RTX costs are
> usually not fully implemented and you introduce a new use kind
> (or is it already used elsewhere in this way to compute cost
> difference of a set with s/reg/const?).

Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the
decision. e.g. in function attempt_change of auto-inc-dec.c, it has
code segments like:

  old_cost = (set_src_cost (mem, speed)
              + set_rtx_cost (PATTERN (inc_insn.insn), speed));
  new_cost = set_src_cost (mem_tmp, speed);
  ...
  if (old_cost < new_cost)
    {
      ...
      return false;
    }

The usage of RTX costs in this patch is similar.

I had run X86-64 bootstrap and regression tests with
--enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java

And ARM bootstrap and regression tests with
--enable-languages=c,c++,fortran,lto,objc,obj-c++

I will run tests on i686. What other tests do you think I have to run?

> What kind of performance difference do you see?

I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm
backend changes). Coremark with some options show >10% performance
improvement. dhrystone is a little better. Some wave in eembc, but
overall result is better.

I will run spec2000 on X86-64 and ARM, and back to you about the
performance changes.

Thanks!
-Zhenqiang

> Thanks,
> Richard.
>
>>    /* Usually we substitute easy stuff, so we won't copy everything.
>>       We however need to take care to not duplicate non-trivial CONST
>> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>>    to = copy_rtx (to);
>>
>>    validate_replace_src_group (from, to, insn);
>> +
>> +  /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
>> +     And it can be shared by different references.  So skip propagation if
>> +     it makes INSN's rtx cost higher.  */
>> +  if (set && !copy_p && CONSTANT_P (to))
>> +    {
>> +      int new_cost = set_rtx_cost (set, speed);
>> +      if (new_cost > old_cost)
>> +       {
>> +         cancel_changes (0);
>> +         return false;
>> +       }
>> +    }
>> +
>>    if (num_changes_pending () && apply_change_group ())
>>      success = 1;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]