This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improve canonicalisation of TARGET_MEM_REFs


On Mon, Nov 20, 2017 at 11:02 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Nov 7, 2017 at 7:04 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Fri, Nov 3, 2017 at 5:32 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> A general TARGET_MEM_REF is:
>>>>
>>>>     BASE + STEP * INDEX + INDEX2 + OFFSET
>>>>
>>>> After classifying the address in this way, the code that builds
>>>> TARGET_MEM_REFs tries to simplify the address until it's valid
>>>> for the current target and for the mode of memory being addressed.
>>>> It does this in a fixed order:
>>>>
>>>> (1) add SYMBOL to BASE
>>>> (2) add INDEX * STEP to the base, if STEP != 1
>>>> (3) add OFFSET to INDEX or BASE (reverted if unsuccessful)
>>>> (4) add INDEX to BASE
>>>> (5) add OFFSET to BASE
>>>>
>>>> So suppose we had an address:
>>>>
>>>>     &symbol + offset + index * 8   (e.g. "a[i + 1]" for a global "a")
>>>>
>>>> on a target that only allows an index or an offset, not both.  Following
>>>> the steps above, we'd first create:
>>>>
>>>>     tmp = symbol
>>>>     tmp2 = tmp + index * 8
>>>>
>>>> Then if the given offset value was valid for the mode being addressed,
>>>> we'd create:
>>>>
>>>>     MEM[base:tmp2, offset:offset]
>>>>
>>>> while if it was invalid we'd create:
>>>>
>>>>     tmp3 = tmp2 + offset
>>>>     MEM[base:tmp3, offset:0]
>>>>
>>>> The problem is that this could happen if ivopts had decided to use
>>>> a scaled index for an address that happens to have a constant base.
>>>> The old procedure failed to give an indexed TARGET_MEM_REF in that case,
>>>> and adding the offset last prevented later passes from being able to
>>>> fold the index back in.
>>>>
>>>> The patch avoids this by skipping (2) if BASE + INDEX * STEP
>>>> is a legitimate address and if OFFSET is stopping the address
>>>> being valid.
>>>>
>>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>>>> OK to install?
>>>>
>>>> Richard
>>>>
>>>>
>>>> 2017-10-31  Richard Sandiford  <richard.sandiford@linaro.org>
>>>>             Alan Hayward  <alan.hayward@arm.com>
>>>>             David Sherwood  <david.sherwood@arm.com>
>>>>
>>>> gcc/
>>>>         * tree-ssa-address.c (keep_index_p): New function.
>>>>         (create_mem_ref): Use it.  Only split out the INDEX * STEP
>>>>         component if that is invalid even with the symbol and offset
>>>>         removed.
>>>>
>>>> Index: gcc/tree-ssa-address.c
>>>> ===================================================================
>>>> --- gcc/tree-ssa-address.c      2017-11-03 12:15:44.097060121 +0000
>>>> +++ gcc/tree-ssa-address.c      2017-11-03 12:21:18.060359821 +0000
>>>> @@ -746,6 +746,20 @@ gimplify_mem_ref_parts (gimple_stmt_iter
>>>>                                              true, GSI_SAME_STMT);
>>>>  }
>>>>
>>>> +/* Return true if the STEP in PARTS gives a valid BASE + INDEX * STEP
>>>> +   address for type TYPE and if the offset is making it appear invalid.  */
>>>> +
>>>> +static bool
>>>> +keep_index_p (tree type, mem_address parts)
>>>
>>> mem_ref_valid_without_offset_p (...)
>>>
>>> ?
>>
>> OK.
>>
>>>> +{
>>>> +  if (!parts.base)
>>>> +    return false;
>>>> +
>>>> +  gcc_assert (!parts.symbol);
>>>> +  parts.offset = NULL_TREE;
>>>> +  return valid_mem_ref_p (TYPE_MODE (type), TYPE_ADDR_SPACE (type), &parts);
>>>> +}
>>>> +
>>>>  /* Creates and returns a TARGET_MEM_REF for address ADDR.  If necessary
>>>>     computations are emitted in front of GSI.  TYPE is the mode
>>>>     of created memory reference. IV_CAND is the selected iv candidate in ADDR,
>>>> @@ -809,7 +823,8 @@ create_mem_ref (gimple_stmt_iterator *gs
>>>
>>> Which means all of the following would be more naturally written as
>>>
>>>>       into:
>>>>         index' = index << step;
>>>>         [... + index' + ,,,].  */
>>>> -  if (parts.step && !integer_onep (parts.step))
>>>> +  bool scaled_p = (parts.step && !integer_onep (parts.step));
>>>> +  if (scaled_p && !keep_index_p (type, parts))
>>>>      {
>>>
>>>   if (mem_ref_valid_without_offset_p (...))
>>>    {
>>>      ...
>>>      return create_mem_ref_raw (...);
>>>    }
>>
>> Is this inside the test for a scale:
>>
>>   if (parts.step && !integer_onep (parts.step))
>>     {
>>       if (mem_ref_valid_without_offset_p (...))
>>         {
>>           tree tmp = parts.offset;
>>           if (parts.base)
>>             {
>>               tmp = fold_build_pointer_plus (parts.base, tmp);
>>               tmp = force_gimple_operand_gsi_1 (gsi, tmp,
>>                                                 is_gimple_mem_ref_addr,
>>                                                 NULL_TREE, true,
>>                                                 GSI_SAME_STMT);
>>             }
>>           parts.base = tmp;
>>           parts.offset = NULL_TREE;
>>           mem_ref = create_mem_ref_raw (type, alias_ptr_type, &parts, true);
>>           gcc_assert (mem_ref);
>>           return mem_ref;
>>         }
>>       ...current code...
>>     }
>>
>> ?  I can do that if you don't mind the cut-&-paste.  Or I could
>> split the code that adds the offset out into a subroutine.
>
> Sorry for the late response.  Yes, sth like that.
>
>> If we do it outside the check for a scaled index, we'd need to
>> duplicate the logic about whether to add the offset to the index
>> or the base.
>>
>>> that said, the function should ideally be re-written to try a few more
>>> options non-destructively.  Doesn't IVOPTs itself already verify
>>> the TARGET_MEM_REFs it generates (and costs!) are valid?
>>
>> I think it only evaluates the cost of the address and leaves this
>> routine to make them valid.  (And the costs are usually right for
>> SVE after the ivopts patch I posted.)  We'd probably have to duplicate
>> the same logic if ivopts legitimised the addresses itself.
>
> Hmm.  Just wonder how it can compute costs when it might not
> be legitimized yet.  And if legitimizing doesn't matter for costs
I'd say IVOPTs tries to legitimize TARGET_MEM_REFS during cost
computation, but...  So the whole idea is:
1) IVOPTs computes address expression in the form of addr_invariant +
IV * scale, in which scale could be 1.
2) IVOPTs tries to distribute addr_invriant into different part of
mem_address, generating memory address as
    base + IV * scale + offset.  Note this procedure is where
legitimization happens, and it should be equal to
    what create_mem_ref does.  Unfortunately, it's only mimic
create_mem_ref/addr_to_parts in a simplified
    way because of reasons listed below.  So I think for most cases
IVOPTs get_address_cost actually gets
    the legitimate form of address, otherwise we can simply reuse it
in rewriting address in the end.
3) In rewriting address, IVOPTs calls create_mem_ref generating
legitimate TARGET_MEM_REF which has
    the same form (part distribution) as cost computation in step 2).

Ideally, the cost computation should do the legitimization work and
the result should be simply reused during
rewriting.  In this case, we can actually remove create_mem_ref call
in the end.   But,
1) Cost computation is of O(mn) complexity.  So I didn't do fully
legitimization.  Instead I simply use mimic
mem_address like:  {symbol, base = integer_one_node, index =
integer_one_node, step, offset}.  It can be
seen as legitimization with base/index parts parameterized.
2) Another reason cost computation can't do full legitimization is as
commented in rewrite_use_address.
We don't know memory object hint before candidates are chosen.

When rewriting IVOPTs, I tried to keep aforementioned two processes
the same.  The most important part
is if invariant part of address is costed as invariant (and that's
always the case), then it will be factored out
as loop invariant during create_mem_ref, rather than mixing it with IV part.
This leads to a question about this patch.

+static bool
+keep_index_p (tree type, mem_address parts)
+{
+  if (!parts.base)
+    return false;
+
+  gcc_assert (!parts.symbol);
+  parts.offset = NULL_TREE;
+  return valid_mem_ref_p (TYPE_MODE (type), TYPE_ADDR_SPACE (type), &parts);
+}
+
Given [base + index << scale + offset], it will be always simplified into:
  base' = base + offset
  [base' + index << scale]
even when IV is in base part (i.e, var_in_base is true)?  Of course,
IV is normally in index part.

And
@@ -832,7 +848,9 @@ create_mem_ref (gimple_stmt_iterator *gs
        index' = index + offset;
        [base + index']
      depending on which one is invariant.  */
-  if (parts.offset && !integer_zerop (parts.offset))
+  if (parts.offset
+      && !integer_zerop (parts.offset)
+      && (!var_in_base || !scaled_p))
     {
       tree old_base = unshare_expr (parts.base);
       tree old_index = unshare_expr (parts.index);
IIUC, what we want to do is if index<<scale should be kept, we always
fold base+offset outside of memory expression.
Above code seems hard to follow, should something like below
works(didn't verify)?
  if (parts.offset && !integer_zerop (parts.offset))
    {
      tree old_base = unshare_expr (parts.base);
      tree old_index = unshare_expr (parts.index);
      tree old_offset = unshare_expr (parts.offset);

      tmp = parts.offset;
      parts.offset = NULL_TREE;
      /* Add offset to invariant part.  */
      if (!var_in_base)                 // Change to if (!var_in_base
|| keep_scale_p)
    {
      if (parts.base)
        {
          tmp = fold_build_pointer_plus (parts.base, tmp);
          tmp = force_gimple_operand_gsi_1 (gsi, tmp,
                        is_gimple_mem_ref_addr,
                        NULL_TREE, true,
                        GSI_SAME_STMT);
        }
      parts.base = tmp;
    }

Or we can duplicate code merging base + offset as soon as keep_index_p
is detected?

Thanks,
bin

> why we can't delegitimize during RTL expansion...
>
> Richard.
>
>> Thanks,
>> Richard
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>        gcc_assert (parts.index);
>>>>        parts.index = force_gimple_operand_gsi (gsi,
>>>> @@ -821,6 +836,7 @@ create_mem_ref (gimple_stmt_iterator *gs
>>>>        mem_ref = create_mem_ref_raw (type, alias_ptr_type, &parts, true);
>>>>        if (mem_ref)
>>>>         return mem_ref;
>>>> +      scaled_p = false;
>>>>      }
>>>>
>>>>    /* Add offset to invariant part by transforming address expression:
>>>> @@ -832,7 +848,9 @@ create_mem_ref (gimple_stmt_iterator *gs
>>>>         index' = index + offset;
>>>>         [base + index']
>>>>       depending on which one is invariant.  */
>>>> -  if (parts.offset && !integer_zerop (parts.offset))
>>>> +  if (parts.offset
>>>> +      && !integer_zerop (parts.offset)
>>>> +      && (!var_in_base || !scaled_p))
>>>>      {
>>>>        tree old_base = unshare_expr (parts.base);
>>>>        tree old_index = unshare_expr (parts.index);
>>>> @@ -882,7 +900,7 @@ create_mem_ref (gimple_stmt_iterator *gs
>>>>    /* Transform [base + index + ...] into:
>>>>         base' = base + index;
>>>>         [base' + ...].  */
>>>> -  if (parts.index)
>>>> +  if (parts.index && !scaled_p)
>>>>      {
>>>>        tmp = parts.index;
>>>>        parts.index = NULL_TREE;


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]