[PATCH ARM]Refine scaled address expression on ARM

Fri Nov 29 12:31:00 GMT 2013

On Fri, Nov 29, 2013 at 6:44 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Nov 29, 2013 at 8:52 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>>>> On 18/09/13 10:15, bin.cheng wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
>>>>>> owner@gcc.gnu.org] On Behalf Of bin.cheng
>>>>>> Sent: Monday, September 02, 2013 3:09 PM
>>>>>> To: Richard Earnshaw
>>>>>> Cc: gcc-patches@gcc.gnu.org
>>>>>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Richard Earnshaw
>>>>>>> Sent: Thursday, August 29, 2013 9:06 PM
>>>>>>> To: Bin Cheng
>>>>>>> Cc: gcc-patches@gcc.gnu.org
>>>>>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM
>>>>>>>
>>>>>>> On 28/08/13 08:00, bin.cheng wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This patch refines scaled address expression on ARM.  It supports
>>>>>>>> "base+index*scale" in arm_legitimate_address_outer_p.  It also tries
>>>>>>>> to legitimize "base + index * scale + offset" with "reg <- base +
>>>>>>>> offset;  reg
>>>>>>>> + index * scale" by introducing thumb2_legitimize_address.  For now
>>>>>>>> + function
>>>>>>>> thumb2_legitimize_address is a kind of placeholder and just does the
>>>>>>>> mentioned transformation by calling to try_multiplier_address.
>>>>>>>> Hoping we can improve it in the future.
>>>>>>>>
>>>>>>>> With this patch:
>>>>>>>> 1) "base+index*scale" is recognized.
>>>>>>>
>>>>>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form.
>>>>>>>  So this shouldn't be necessary.  Can you identify where this
>>>>>> non-canoncial form is being generated?
>>>>>>>
>>>>>>
>>>>>> Oh, for now ivopt constructs "index*scale" to test whether backend
>>>>>> supports scaled addressing mode, which is not valid on ARM, so I was going
>>>>>> to construct "base + index*scale" instead.  Since "base + index * scale"
>>>>> is not
>>>>>> canonical form, I will construct the canonical form and drop this part of
>>>>> the
>>>>>> patch.
>>>>>>
>>>>>> Is rest of this patch OK?
>>>>>>
>>>>> Hi Richard, I removed the part over which you concerned and created this
>>>>> updated patch.
>>>>>
>>>>> Is it OK?
>>>>>
>>>>> Thanks.
>>>>> bin
>>>>>
>>>>> 2013-09-18  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>       * config/arm/arm.c (try_multiplier_address): New function.
>>>>>       (thumb2_legitimize_address): New function.
>>>>>       (arm_legitimize_address): Call try_multiplier_address and
>>>>>       thumb2_legitimize_address.
>>>>>
>>>>>
>>>>> 6-arm-scaled_address-20130918.txt
>>>>>
>>>>>
>>>>> Index: gcc/config/arm/arm.c
>>>>> ===================================================================
>>>>> --- gcc/config/arm/arm.c      (revision 200774)
>>>>> +++ gcc/config/arm/arm.c      (working copy)
>>>>> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
>>>>>      }
>>>>>  }
>>>>>
>>>>> +/* Try to find address expression like base + index * scale + offset
>>>>> +   in X.  If we find one, force base + offset into register and
>>>>> +   construct new expression reg + index * scale; return the new
>>>>> +   address expression if it's valid.  Otherwise return X.  */
>>>>> +static rtx
>>>>> +try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED)
>>>>> +{
>>>>> +  rtx tmp, base_reg, new_rtx;
>>>>> +  rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = NULL_RTX;
>>>>> +
>>>>> +  gcc_assert (GET_CODE (x) == PLUS);
>>>>> +
>>>>> +  /* Try to find and record base/index/scale/offset in X. */
>>>>> +  if (GET_CODE (XEXP (x, 1)) == MULT)
>>>>> +    {
>>>>> +      tmp = XEXP (x, 0);
>>>>> +      index = XEXP (XEXP (x, 1), 0);
>>>>> +      scale = XEXP (XEXP (x, 1), 1);
>>>>> +      if (GET_CODE (tmp) != PLUS)
>>>>> +     return x;
>>>>> +
>>>>> +      base = XEXP (tmp, 0);
>>>>> +      offset = XEXP (tmp, 1);
>>>>> +    }
>>>>> +  else
>>>>> +    {
>>>>> +      tmp = XEXP (x, 0);
>>>>> +      offset = XEXP (x, 1);
>>>>> +      if (GET_CODE (tmp) != PLUS)
>>>>> +     return x;
>>>>> +
>>>>> +      base = XEXP (tmp, 0);
>>>>> +      scale = XEXP (tmp, 1);
>>>>> +      if (GET_CODE (base) == MULT)
>>>>> +     {
>>>>> +       tmp = base;
>>>>> +       base = scale;
>>>>> +       scale = tmp;
>>>>> +     }
>>>>> +      if (GET_CODE (scale) != MULT)
>>>>> +     return x;
>>>>> +
>>>>> +      index = XEXP (scale, 0);
>>>>> +      scale = XEXP (scale, 1);
>>>>> +    }
>>>>> +
>>>>> +  if (CONST_INT_P (base))
>>>>> +    {
>>>>> +      tmp = base;
>>>>> +      base = offset;
>>>>> +      offset = tmp;
>>>>> +    }
>>>>> +
>>>>> +  if (CONST_INT_P (index))
>>>>> +    {
>>>>> +      tmp = index;
>>>>> +      index = scale;
>>>>> +      scale = tmp;
>>>>> +    }
>>>>> +
>>>>> +  /* ARM only supports constant scale in address.  */
>>>>> +  if (!CONST_INT_P (scale))
>>>>> +    return x;
>>>>> +
>>>>> +  if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
>>>>> +    return x;
>>>>> +
>>>>> +  /* Only register/constant are allowed in each part.  */
>>>>> +  if (!symbol_mentioned_p (base)
>>>>> +      && !symbol_mentioned_p (offset)
>>>>> +      && !symbol_mentioned_p (index)
>>>>> +      && !symbol_mentioned_p (scale))
>>>>> +    {
>>>>
>>>> It would be easier to do this at the top of the function --
>>>>   if (symbol_mentioned_p (x))
>>>>     return x;
>>>>
>>>>
>>>>> +      /* Force "base+offset" into register and construct
>>>>> +      "register+index*scale".  Return the new expression
>>>>> +      only if it's valid.  */
>>>>> +      tmp = gen_rtx_PLUS (SImode, base, offset);
>>>>> +      base_reg = force_reg (SImode, tmp);
>>>>> +      tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
>>>>> +      new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
>>>>> +      return new_rtx;
>>>>
>>>> I can't help thinking that this is backwards.  That is, you want to
>>>> split out the mult expression and use offset addressing in the addresses
>>>> itself.  That's likely to lead to either better CSE, or more induction
>>> Thanks to your review.
>>>
>>> Actually base+offset is more likely loop invariant than scaled
>>> expression, just as reported in pr57540.  The bug is observed in
>>> spec2k bzip/gzip, and hurts arm in hot loops.  The loop induction
>>> variable doesn't matter that much comparing to invariant because we
>>> are in RTL now.
>>>
>>>> vars.  Furthermore, scaled addressing modes are likely to be more
>>>> expensive than simple offset addressing modes on at least some cores.
>>> The purpose is to CSE (as you pointed out) or hoist loop invariant.
>>> As for addressing mode is concerned, Though we may guide the
>>> transformation once we have reliable address cost mode, we don't have
>>> the information if base+offset is invariant, so it's difficult to
>>> handle in backend, right?
>>>
>>> What do you think about this?
>>>
>>
>> Additionally, there is no induction variable issue here.  The memory
>> reference we are facing during expand are not lowered by gimple IVOPT,
>> which means either it's outside loop, or doesn't have induction
>> variable address.
>>
>> Generally, there are three passes which lowers memory reference:
>> gimple strength reduction picks opportunity outside loop; gimple IVOPT
>> handles references with induction variable addresses; references not
>> handled by SLSR/IVOPT are handled by RTL expand, which makes bad
>> choice of address re-association.  I think Yufeng also noticed the
>> problem and are trying with patch like:
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02878.html
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03339.html
>>
>> After thinking twice, I some kind of think we should not re-associate
>> addresses during expanding, because of lacking of context information.
>>  Take base + scaled_index + offset as an example in PR57540, we just
>> don't know if "base+offset" is loop invariant from either backend or
>> RTL expander.
>>
>> Any comments?
>
> The issue is that re-association and address lowering is not integrated
> with optimizers such as CSE and invariant motion.  This means it's
> hard to arrive at optimal results and all passes are just "guessing"
> and applying heuristics that the programmer thought are appropriate
> for his machine.
>
> I have no easy way out but think that we lower addresses too late
> (at least fully).  Loop optimizers would like to see the complex
> memory reference forms but starting with IVOPTs lowering all
> addresses would likely be a win in the end (there are CSE, LIM
> and re-association passes around after/at this point as well).
>
> Back in the 4.3 days when I for the first time attempted to introduce
> MEM_REF I forcefully lowered all memory accesses and addresses
> very early (during gimplification basically).  That didn't work out well.
>
> So my suggestion to anyone who wants to try something is
> to apply an additional lowering to the whole GIMPLE, lowering
> things like handled-component references and addresses
> (and also things like bitfield accesses).
Yes, once in expander, there will be no enough information.  Since we
don't need to handle references with induction variable addresses
after IVOPT, it's not difficult to take invariant into consideration
when re-associating.  Yet I have no clue how CSE should be considered.

Nevertheless, force "base+scaled*index+offset" into register and use
reg directed addressing mode like PR57540 in expand is not good.

Thanks,
bin

-- 
Best Regards.