This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: Bin Cheng <Bin dot Cheng at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Thu, 11 Feb 2016 17:59:17 +0000
- Subject: Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
- Authentication-results: sourceware.org; auth=none
- References: <HE1PR08MB050748A21B9A8BA2CEBA3D21E7D60 at HE1PR08MB0507 dot eurprd08 dot prod dot outlook dot com> <56BC34CE dot 20100 at redhat dot com>
On Thu, Feb 11, 2016 at 7:14 AM, Jeff Law <law@redhat.com> wrote:
> On 02/09/2016 04:08 AM, Bin Cheng wrote:
>>
>> Hi,
>> When counting cost for loop inv, GCC checks if a loop inv can be
>> propagated into its use site (a memory reference). If it cannot be
>> propagated, we increase its cost so that it's expensive enough to be hoisted
>> out of loop. Currently we simply replace loop inv register in the use site
>> with its definition expression, then call validate_changes to check if the
>> result insn is valid. This is weak because validate_changes doesn't take
>> canonicalization into consideration. Given below example:
>>
>> Loop inv def:
>> 69: r149:SI=r87:SI+const(unspec[`xxxx'] 1)
>> REG_DEAD r87:SI
>> Loop inv use:
>> 70: r150:SI=[r90:SI*0x4+r149:SI]
>> REG_DEAD r149:SI
>>
>> The address expression after propagation is "r90 * 0x4 + (r87 +
>> const(unspec[`xxxx']))". Function validate_changes simply returns false to
>> it. As a matter of fact, the propagation is feasible if we canonicalize
>> address expression into the form like "(r90 * 0x4 + r87) +
>> const(unspec[`xxxx'])".
>>
>> This patch fixes the problem by canonicalizing address expression and
>> verifying if the new addr is valid. The canonicalization follows GCC insn
>> canonicalization rules. The test case from bugzilla PR is also included.
>> As for the canonicalize_address interface, there is another
>> canonicalize_address in fwprop.c which only changes shift into mult. I
>> think it would be good to factor out a common RTL interface in GCC, but
>> that's stage1 work.
>
>
> Also note there's bits in combine that will canonicalize appropriate shifts
> into mults. Clearly there's a need for some generalized routines to take a
> fairly generic address and perform canonicalizations and simplifications on
> it.
>
>> Bootstrap and test on x86_64 and AArch64. Is it OK?
>>
>> Thanks,
>> bin
>>
>> 2016-02-09 Bin Cheng<bin.cheng@arm.com>
>>
>> PR tree-optimization/69052
>> * loop-invariant.c (canonicalize_address): New function.
>> (inv_can_prop_to_addr_use): Check validity of address expression
>> which is canonicalized by above function.
>>
>> gcc/testsuite/ChangeLog
>> 2016-02-09 Bin Cheng<bin.cheng@arm.com>
>>
>> PR tree-optimization/69052
>> * gcc.target/i386/pr69052.c: New test.
>>
>>
>> pr69052-20160204.txt
>>
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>> index 707f044..157e273 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -754,6 +754,74 @@ create_new_invariant (struct def *def, rtx_insn
>> *insn, bitmap depends_on,
>> return inv;
>> }
>>
>> +/* Returns a canonical version address for X. It identifies
>> + addr expr in the form of A + B + C. Following instruction
>> + canonicalization rules, MULT operand is moved to the front,
>> + CONST operand is moved to the end; also PLUS operators are
>> + chained to the left. */
>> +
>> +static rtx
>> +canonicalize_address (rtx x)
>> +{
>> + rtx op0, op1, op2;
>> + machine_mode mode = GET_MODE (x);
>> + enum rtx_code code = GET_CODE (x);
>> +
>> + if (code != PLUS)
>> + return x;
>> +
>> + /* Extract operands from A + B (+ C). */
>> + if (GET_CODE (XEXP (x, 0)) == PLUS)
>> + {
>> + op0 = XEXP (XEXP (x, 0), 0);
>> + op1 = XEXP (XEXP (x, 0), 1);
>> + op2 = XEXP (x, 1);
>> + }
>> + else if (GET_CODE (XEXP (x, 1)) == PLUS)
>> + {
>> + op0 = XEXP (x, 0);
>> + op1 = XEXP (XEXP (x, 1), 0);
>> + op2 = XEXP (XEXP (x, 1), 1);
>> + }
>> + else
>> + {
>> + op0 = XEXP (x, 0);
>> + op1 = XEXP (x, 1);
>> + op2 = NULL_RTX;
>> + }
>> +
>> + /* Move MULT operand to the front. */
>> + if (!REG_P (op1) && !CONST_INT_P (op1))
>> + std::swap (op0, op1);
>
> This feels a bit hack-ish in the sense that you already know the form of the
> RTL you're expecting and just assume that you'll be given something of that
> form, but no more complex.
>
> ISTM you're better off walking the whole rtx, recording the tidbits as you
> go into a vec. If you see something unexpected during that walk, you punt
> canonicalization of the whole expression.
>
> You then sort the vec. You want to move things like MULT to the start and
> all the constants to the end I think.
>
> You then do simplifications, particularly on the constants, but there may be
> something useful to do with MULT terms as well. You could also arrange to
> rewrite ASHIFTs into MULTs at this stage.
>
> Then you generate a new equivalent expression from the simplified operands
> in the vec.
>
> You might look at tree-ssa-reassoc for ideas on implementation details.
>
> Initially just use it in the LICM code, but I think given that kind of
> structure it'd be generally useful to replace bits of combine and fwprop
>
> If your contention is that only a few forms really matter, then I'd like to
> see those forms spelled out better in the comment and some kind of checking
> that we have reasonable incoming RTL.
>
>
>> +
>> + /* Move CONST operand to the end. */
>> + if (CONST_INT_P (op0))
>> + std::swap (op0, op1);
>
> You might want to check CONSTANT_P here. Maybe it doesn't matter in
> practice, but things like (plus (plus (symbol-ref) (const_int) const_int))
>
> That also gives you a fighting chance at extending this to handle
> HIGH/LO_SUM which are going to appear on the RISCy targets.
>
> So I think the concept of making sure we're passing canonical RTL to the
> verification step is good, I'm a bit concerned about the implementation of
> the canonicalization step.
Hi Jeff,
Thanks for detailed review. I also think a generic canonical
interface for RTL is much better. I will give it a try. But with
high chance it's a next stage1 stuff.
Thanks,
bin
>
> Jeff