This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH ARM]Refine scaled address expression on ARM
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: "Bin.Cheng" <amker dot cheng at gmail dot com>
- Cc: Richard Earnshaw <rearnsha at arm dot com>, "bin.cheng" <bin dot cheng at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Yufeng Zhang <Yufeng dot Zhang at arm dot com>
- Date: Fri, 29 Nov 2013 12:29:21 +0100
- Subject: Re: [PATCH ARM]Refine scaled address expression on ARM
- Authentication-results: sourceware.org; auth=none
- References: <002a01cea3bc$43d36670$cb7a3350$ at arm dot com> <521F474A dot 9000905 at arm dot com> <004701cea7ab$43227160$c9675420$ at arm dot com> <000601ceb44f$a6db2980$f4917c80$ at arm dot com> <52971F66 dot 3070307 at arm dot com> <CAHFci2_iYY5EEgYpBDAMU_3FhOggcTx0yH978iWSs5kedgpodQ at mail dot gmail dot com> <CAHFci2_8wW2hP0VrxnUgEhecDKymzdqCDkn7gbpZ9Sk9pbro9w at mail dot gmail dot com> <CAFiYyc3tTE03qTKm6MqYvypgk3ESq8a8k3hmjYse5Jmz801yOg at mail dot gmail dot com> <CAHFci29R+QQriWp3X1aegQxPJfWWATXkyRx--tQ1Ky3yE678ng at mail dot gmail dot com>
On Fri, Nov 29, 2013 at 12:03 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Nov 29, 2013 at 6:44 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Nov 29, 2013 at 8:52 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>>>>> On 18/09/13 10:15, bin.cheng wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
>>>>>>> owner@gcc.gnu.org] On Behalf Of bin.cheng
>>>>>>> Sent: Monday, September 02, 2013 3:09 PM
>>>>>>> To: Richard Earnshaw
>>>>>>> Cc: gcc-patches@gcc.gnu.org
>>>>>>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Richard Earnshaw
>>>>>>>> Sent: Thursday, August 29, 2013 9:06 PM
>>>>>>>> To: Bin Cheng
>>>>>>>> Cc: gcc-patches@gcc.gnu.org
>>>>>>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM
>>>>>>>>
>>>>>>>> On 28/08/13 08:00, bin.cheng wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This patch refines scaled address expression on ARM. It supports
>>>>>>>>> "base+index*scale" in arm_legitimate_address_outer_p. It also tries
>>>>>>>>> to legitimize "base + index * scale + offset" with "reg <- base +
>>>>>>>>> offset; reg
>>>>>>>>> + index * scale" by introducing thumb2_legitimize_address. For now
>>>>>>>>> + function
>>>>>>>>> thumb2_legitimize_address is a kind of placeholder and just does the
>>>>>>>>> mentioned transformation by calling to try_multiplier_address.
>>>>>>>>> Hoping we can improve it in the future.
>>>>>>>>>
>>>>>>>>> With this patch:
>>>>>>>>> 1) "base+index*scale" is recognized.
>>>>>>>>
>>>>>>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form.
>>>>>>>> So this shouldn't be necessary. Can you identify where this
>>>>>>> non-canoncial form is being generated?
>>>>>>>>
>>>>>>>
>>>>>>> Oh, for now ivopt constructs "index*scale" to test whether backend
>>>>>>> supports scaled addressing mode, which is not valid on ARM, so I was going
>>>>>>> to construct "base + index*scale" instead. Since "base + index * scale"
>>>>>> is not
>>>>>>> canonical form, I will construct the canonical form and drop this part of
>>>>>> the
>>>>>>> patch.
>>>>>>>
>>>>>>> Is rest of this patch OK?
>>>>>>>
>>>>>> Hi Richard, I removed the part over which you concerned and created this
>>>>>> updated patch.
>>>>>>
>>>>>> Is it OK?
>>>>>>
>>>>>> Thanks.
>>>>>> bin
>>>>>>
>>>>>> 2013-09-18 Bin Cheng <bin.cheng@arm.com>
>>>>>>
>>>>>> * config/arm/arm.c (try_multiplier_address): New function.
>>>>>> (thumb2_legitimize_address): New function.
>>>>>> (arm_legitimize_address): Call try_multiplier_address and
>>>>>> thumb2_legitimize_address.
>>>>>>
>>>>>>
>>>>>> 6-arm-scaled_address-20130918.txt
>>>>>>
>>>>>>
>>>>>> Index: gcc/config/arm/arm.c
>>>>>> ===================================================================
>>>>>> --- gcc/config/arm/arm.c (revision 200774)
>>>>>> +++ gcc/config/arm/arm.c (working copy)
>>>>>> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> +/* Try to find address expression like base + index * scale + offset
>>>>>> + in X. If we find one, force base + offset into register and
>>>>>> + construct new expression reg + index * scale; return the new
>>>>>> + address expression if it's valid. Otherwise return X. */
>>>>>> +static rtx
>>>>>> +try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
>>>>>> +{
>>>>>> + rtx tmp, base_reg, new_rtx;
>>>>>> + rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = NULL_RTX;
>>>>>> +
>>>>>> + gcc_assert (GET_CODE (x) == PLUS);
>>>>>> +
>>>>>> + /* Try to find and record base/index/scale/offset in X. */
>>>>>> + if (GET_CODE (XEXP (x, 1)) == MULT)
>>>>>> + {
>>>>>> + tmp = XEXP (x, 0);
>>>>>> + index = XEXP (XEXP (x, 1), 0);
>>>>>> + scale = XEXP (XEXP (x, 1), 1);
>>>>>> + if (GET_CODE (tmp) != PLUS)
>>>>>> + return x;
>>>>>> +
>>>>>> + base = XEXP (tmp, 0);
>>>>>> + offset = XEXP (tmp, 1);
>>>>>> + }
>>>>>> + else
>>>>>> + {
>>>>>> + tmp = XEXP (x, 0);
>>>>>> + offset = XEXP (x, 1);
>>>>>> + if (GET_CODE (tmp) != PLUS)
>>>>>> + return x;
>>>>>> +
>>>>>> + base = XEXP (tmp, 0);
>>>>>> + scale = XEXP (tmp, 1);
>>>>>> + if (GET_CODE (base) == MULT)
>>>>>> + {
>>>>>> + tmp = base;
>>>>>> + base = scale;
>>>>>> + scale = tmp;
>>>>>> + }
>>>>>> + if (GET_CODE (scale) != MULT)
>>>>>> + return x;
>>>>>> +
>>>>>> + index = XEXP (scale, 0);
>>>>>> + scale = XEXP (scale, 1);
>>>>>> + }
>>>>>> +
>>>>>> + if (CONST_INT_P (base))
>>>>>> + {
>>>>>> + tmp = base;
>>>>>> + base = offset;
>>>>>> + offset = tmp;
>>>>>> + }
>>>>>> +
>>>>>> + if (CONST_INT_P (index))
>>>>>> + {
>>>>>> + tmp = index;
>>>>>> + index = scale;
>>>>>> + scale = tmp;
>>>>>> + }
>>>>>> +
>>>>>> + /* ARM only supports constant scale in address. */
>>>>>> + if (!CONST_INT_P (scale))
>>>>>> + return x;
>>>>>> +
>>>>>> + if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
>>>>>> + return x;
>>>>>> +
>>>>>> + /* Only register/constant are allowed in each part. */
>>>>>> + if (!symbol_mentioned_p (base)
>>>>>> + && !symbol_mentioned_p (offset)
>>>>>> + && !symbol_mentioned_p (index)
>>>>>> + && !symbol_mentioned_p (scale))
>>>>>> + {
>>>>>
>>>>> It would be easier to do this at the top of the function --
>>>>> if (symbol_mentioned_p (x))
>>>>> return x;
>>>>>
>>>>>
>>>>>> + /* Force "base+offset" into register and construct
>>>>>> + "register+index*scale". Return the new expression
>>>>>> + only if it's valid. */
>>>>>> + tmp = gen_rtx_PLUS (SImode, base, offset);
>>>>>> + base_reg = force_reg (SImode, tmp);
>>>>>> + tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
>>>>>> + new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
>>>>>> + return new_rtx;
>>>>>
>>>>> I can't help thinking that this is backwards. That is, you want to
>>>>> split out the mult expression and use offset addressing in the addresses
>>>>> itself. That's likely to lead to either better CSE, or more induction
>>>> Thanks to your review.
>>>>
>>>> Actually base+offset is more likely loop invariant than scaled
>>>> expression, just as reported in pr57540. The bug is observed in
>>>> spec2k bzip/gzip, and hurts arm in hot loops. The loop induction
>>>> variable doesn't matter that much comparing to invariant because we
>>>> are in RTL now.
>>>>
>>>>> vars. Furthermore, scaled addressing modes are likely to be more
>>>>> expensive than simple offset addressing modes on at least some cores.
>>>> The purpose is to CSE (as you pointed out) or hoist loop invariant.
>>>> As for addressing mode is concerned, Though we may guide the
>>>> transformation once we have reliable address cost mode, we don't have
>>>> the information if base+offset is invariant, so it's difficult to
>>>> handle in backend, right?
>>>>
>>>> What do you think about this?
>>>>
>>>
>>> Additionally, there is no induction variable issue here. The memory
>>> reference we are facing during expand are not lowered by gimple IVOPT,
>>> which means either it's outside loop, or doesn't have induction
>>> variable address.
>>>
>>> Generally, there are three passes which lowers memory reference:
>>> gimple strength reduction picks opportunity outside loop; gimple IVOPT
>>> handles references with induction variable addresses; references not
>>> handled by SLSR/IVOPT are handled by RTL expand, which makes bad
>>> choice of address re-association. I think Yufeng also noticed the
>>> problem and are trying with patch like:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02878.html
>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03339.html
>>>
>>> After thinking twice, I some kind of think we should not re-associate
>>> addresses during expanding, because of lacking of context information.
>>> Take base + scaled_index + offset as an example in PR57540, we just
>>> don't know if "base+offset" is loop invariant from either backend or
>>> RTL expander.
>>>
>>> Any comments?
>>
>> The issue is that re-association and address lowering is not integrated
>> with optimizers such as CSE and invariant motion. This means it's
>> hard to arrive at optimal results and all passes are just "guessing"
>> and applying heuristics that the programmer thought are appropriate
>> for his machine.
>>
>> I have no easy way out but think that we lower addresses too late
>> (at least fully). Loop optimizers would like to see the complex
>> memory reference forms but starting with IVOPTs lowering all
>> addresses would likely be a win in the end (there are CSE, LIM
>> and re-association passes around after/at this point as well).
>>
>> Back in the 4.3 days when I for the first time attempted to introduce
>> MEM_REF I forcefully lowered all memory accesses and addresses
>> very early (during gimplification basically). That didn't work out well.
>>
>> So my suggestion to anyone who wants to try something is
>> to apply an additional lowering to the whole GIMPLE, lowering
>> things like handled-component references and addresses
>> (and also things like bitfield accesses).
> Yes, once in expander, there will be no enough information. Since we
> don't need to handle references with induction variable addresses
> after IVOPT, it's not difficult to take invariant into consideration
> when re-associating. Yet I have no clue how CSE should be considered.
You'd need to detect that (a + b)*c can be computed differently
if both a*c and b*c are available for example. There is even a PR
about this somewhere.
> Nevertheless, force "base+scaled*index+offset" into register and use
> reg directed addressing mode like PR57540 in expand is not good.
Sure ... this is why we have IVOPTs and SLSR. But both would
work ok with lowered input I think and would simply build up
a "leveled" IL suitable for the target. IVOPTs also does some
simple CSE considerations AFAIK. In the end what matters for
speed is loops after all ;)
Richard.
> Thanks,
> bin
>
> --
> Best Regards.