This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization

From: "Bin.Cheng" <amker dot cheng at gmail dot com>
To: Jeff Law <law at redhat dot com>
Cc: Bin Cheng <Bin dot Cheng at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
Date: Thu, 11 Feb 2016 17:59:17 +0000
Subject: Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
Authentication-results: sourceware.org; auth=none
References: <HE1PR08MB050748A21B9A8BA2CEBA3D21E7D60 at HE1PR08MB0507 dot eurprd08 dot prod dot outlook dot com> <56BC34CE dot 20100 at redhat dot com>

On Thu, Feb 11, 2016 at 7:14 AM, Jeff Law <law@redhat.com> wrote:
> On 02/09/2016 04:08 AM, Bin Cheng wrote:
>>
>> Hi,
>> When counting cost for loop inv, GCC checks if a loop inv can be
>> propagated into its use site (a memory reference).  If it cannot be
>> propagated, we increase its cost so that it's expensive enough to be hoisted
>> out of loop.  Currently we simply replace loop inv register in the use site
>> with its definition expression, then call validate_changes to check if the
>> result insn is valid.  This is weak because validate_changes doesn't take
>> canonicalization into consideration.  Given below example:
>>
>>    Loop inv def:
>>     69: r149:SI=r87:SI+const(unspec[`xxxx'] 1)
>>        REG_DEAD r87:SI
>>    Loop inv use:
>>     70: r150:SI=[r90:SI*0x4+r149:SI]
>>        REG_DEAD r149:SI
>>
>> The address expression after propagation is "r90 * 0x4 + (r87 +
>> const(unspec[`xxxx']))".  Function validate_changes simply returns false to
>> it.  As a matter of fact, the propagation is feasible if we canonicalize
>> address expression into the form like "(r90 * 0x4 + r87) +
>> const(unspec[`xxxx'])".
>>
>> This patch fixes the problem by canonicalizing address expression and
>> verifying if the new addr is valid.  The canonicalization follows GCC insn
>> canonicalization rules.  The test case from bugzilla PR is also included.
>> As for the canonicalize_address interface, there is another
>> canonicalize_address in fwprop.c which only changes shift into mult.  I
>> think it would be good to factor out a common RTL interface in GCC, but
>> that's stage1 work.
>
>
> Also note there's bits in combine that will canonicalize appropriate shifts
> into mults.  Clearly there's a need for some generalized routines to take a
> fairly generic address and perform canonicalizations and simplifications on
> it.
>
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> Thanks,
>> bin
>>
>> 2016-02-09  Bin Cheng<bin.cheng@arm.com>
>>
>>         PR tree-optimization/69052
>>         * loop-invariant.c (canonicalize_address): New function.
>>         (inv_can_prop_to_addr_use): Check validity of address expression
>>         which is canonicalized by above function.
>>
>> gcc/testsuite/ChangeLog
>> 2016-02-09  Bin Cheng<bin.cheng@arm.com>
>>
>>         PR tree-optimization/69052
>>         * gcc.target/i386/pr69052.c: New test.
>>
>>
>> pr69052-20160204.txt
>>
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>> index 707f044..157e273 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -754,6 +754,74 @@ create_new_invariant (struct def *def, rtx_insn
>> *insn, bitmap depends_on,
>>     return inv;
>>   }
>>
>> +/* Returns a canonical version address for X.  It identifies
>> +   addr expr in the form of A + B + C.  Following instruction
>> +   canonicalization rules, MULT operand is moved to the front,
>> +   CONST operand is moved to the end; also PLUS operators are
>> +   chained to the left.  */
>> +
>> +static rtx
>> +canonicalize_address (rtx x)
>> +{
>> +  rtx op0, op1, op2;
>> +  machine_mode mode = GET_MODE (x);
>> +  enum rtx_code code = GET_CODE (x);
>> +
>> +  if (code != PLUS)
>> +    return x;
>> +
>> +  /* Extract operands from A + B (+ C).  */
>> +  if (GET_CODE (XEXP (x, 0)) == PLUS)
>> +    {
>> +      op0 = XEXP (XEXP (x, 0), 0);
>> +      op1 = XEXP (XEXP (x, 0), 1);
>> +      op2 = XEXP (x, 1);
>> +    }
>> +  else if (GET_CODE (XEXP (x, 1)) == PLUS)
>> +    {
>> +      op0 = XEXP (x, 0);
>> +      op1 = XEXP (XEXP (x, 1), 0);
>> +      op2 = XEXP (XEXP (x, 1), 1);
>> +    }
>> +  else
>> +    {
>> +      op0 = XEXP (x, 0);
>> +      op1 = XEXP (x, 1);
>> +      op2 = NULL_RTX;
>> +    }
>> +
>> +  /* Move MULT operand to the front.  */
>> +  if (!REG_P (op1) && !CONST_INT_P (op1))
>> +    std::swap (op0, op1);
>
> This feels a bit hack-ish in the sense that you already know the form of the
> RTL you're expecting and just assume that you'll be given something of that
> form, but no more complex.
>
> ISTM you're better off walking the whole rtx, recording the tidbits as you
> go into a vec.  If you see something unexpected during that walk, you punt
> canonicalization of the whole expression.
>
> You then sort the vec.  You want to move things like MULT to the start and
> all the constants to the end I think.
>
> You then do simplifications, particularly on the constants, but there may be
> something useful to do with MULT terms as well.  You could also arrange to
> rewrite ASHIFTs into MULTs at this stage.
>
> Then you generate a new equivalent expression from the simplified operands
> in the vec.
>
> You might look at tree-ssa-reassoc for ideas on implementation details.
>
> Initially just use it in the LICM code, but I think given that kind of
> structure it'd be generally useful to replace bits of combine and fwprop
>
> If your contention is that only a few forms really matter, then I'd like to
> see those forms spelled out better in the comment and some kind of checking
> that we have reasonable incoming RTL.
>
>
>> +
>> +  /* Move CONST operand to the end.  */
>> +  if (CONST_INT_P (op0))
>> +    std::swap (op0, op1);
>
> You might want to check CONSTANT_P here.  Maybe it doesn't matter in
> practice, but things like (plus (plus (symbol-ref) (const_int) const_int))
>
> That also gives you a fighting chance at extending this to handle
> HIGH/LO_SUM which are going to appear on the RISCy targets.
>
> So I think the concept of making sure we're passing canonical RTL to the
> verification step is good, I'm a bit concerned about the implementation of
> the canonicalization step.
Hi Jeff,
Thanks for detailed review.  I also think a generic canonical
interface for RTL is much better.  I will give it a try.  But with
high chance it's a next stage1 stuff.

Thanks,
bin
>
> Jeff

Follow-Ups:
- Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  - From: Jeff Law

References:
- [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  - From: Bin Cheng
- Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
  - From: Jeff Law

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]