This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom


On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> For the moment, tree-predcom.c only supports invariant/load-loads/store-loads chains.
>>> This patch generalizes dead store elimination (or store motion) across loop iterations in
>>> predictive commoning pass by supporting store-store chain.  As comment in the patch:
>>>
>>>    Apart from predictive commoning on Load-Load and Store-Load chains, we
>>>    also support Store-Store chains -- stores killed by other store can be
>>>    eliminated.  Given below example:
>>>
>>>      for (i = 0; i < n; i++)
>>>        {
>>>          a[i] = 1;
>>>          a[i+2] = 2;
>>>        }
>>>
>>>    It can be replaced with:
>>>
>>>      t0 = a[0];
>>>      t1 = a[1];
>>>      for (i = 0; i < n; i++)
>>>        {
>>>          a[i] = 1;
>>>          t2 = 2;
>>>          t0 = t1;
>>>          t1 = t2;
>>>        }
>>>      a[n] = t0;
>>>      a[n+1] = t1;
>>>
>>>    If the loop runs more than 1 iterations, it can be further simplified into:
>>>
>>>      for (i = 0; i < n; i++)
>>>        {
>>>          a[i] = 1;
>>>        }
>>>      a[n] = 2;
>>>      a[n+1] = 2;
>>>
>>>    The interesting part is this can be viewed either as general store motion
>>>    or general dead store elimination in either intra/inter-iterations way.
>>>
>>> There are number of interesting facts about this enhancement:
>>> a) This patch supports dead store elimination for both across-iteration case and single-iteration
>>>      case.  For the latter, it is dead store elimination.
>>> b) There are advantages supporting dead store elimination in predcom, for example, it has
>>>      complete information about memory address.  On the contrary, DSE pass can only handle
>>>      memory references with exact the same memory address expression.
>>> c) It's cheap to support store-stores chain in predcom based on existing code.
>>> d) As commented, the enhancement can be viewed as either generalized dead store elimination
>>>      or generalized store motion.  I prefer DSE here.
>>>
>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>
>> Looks mostly ok.  I have a few questions though.
>>
>> +  /* Don't do store elimination if loop has multiple exit edges.  */
>> +  bool eliminate_store_p = single_exit (loop) != NULL;
>>
>> handling this would be an enhancement?  IIRC LIM store-motion handles this
>> just fine by emitting code on all exits.
> It is an enhancement with a little bit more complication.  We would
> need to setup/record finalizer memory references for different exit
> edges.  I added TODO description for this (and following one).  Is it
> okay to pick up this in the future?

Yes.

>>
>> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec<chain_p> chains)
>>      {
>>        if (chain->type == CT_INVARIANT)
>>         continue;
>> +      /* Don't unroll when eliminating stores.  */
>> +      else if (chain->type == CT_STORE_STORE)
>> +       return 1;
>>
>> this is a hard exit value so we do not handle the case where another chain
>> in the loop would want to unroll? (enhancement?)  I'd have expected to do
>> the same as for CT_INVARIANT here.
> I didn't check what change is needed in case of unrolling.  I am not
> very sure if we should prefer unroll for *load chains or prefer not
> unroll for store-store chains, because unroll in general increases
> loop-carried register pressure for store-store chains rather than
> decreases register pressure for *load chains.
> I was also thinking if it's possible to restrict unrolling somehow in
> order to enable predcom at O2.  BTW, this is not common, it only
> happens once in spec2k6 with factor forced to 1.  So okay if as it is
> now?

I think it is ok for now with a TODO added.  Please change the comment
to "we can't handle unrolling when eliminating stores" (it's not clear if we
can -- did you try?  maybe add a testcase that would ICE)

>
>>
>> +      tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
>> +      if (!chain->all_always_accessed && tree_could_trap_p (init))
>> +       {
>> +         gimple_seq_discard (stmts);
>> +         return false;
>>
>> so this is the only place that remotely cares for not always performed stores.
>> But as-is the patch doesn't seem to avoid speculating stores and thus
>> violates the C++ memory model, aka, introduces store-data-races?  The LIM
>> store-motion code was fixed to avoid this by keeping track of whether a BB
>> has executed to guard the stores done in the compensation code on the loop
>> exit.
>>
>> That said, to "fix" this all && tree_could_trap_p cases would need to be removed
>> (or similarly flag vars be introduced).  Speculating loads that do not
>> trap is ok
>> (might only introduce false uninit use reports by tools like valgrind).
> Hmm, not sure IIUC.  Patch updated, is it correct (though conservative)?

+static bool
+prepare_initializers_chain_store_elim (struct loop *, chain_p chain)
+{
...
+      tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
+      if (!chain->all_always_accessed && tree_could_trap_p (init))
+       {

remove && tree_could_trap_p and hoist the other check.

Same in prepare_finalizers_chain.  I think you should check

   if (! chain->all_always_accessed
      && ! PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
     return false;

at the beginning of both functions and retain

    if (! chain->all_always_accessed && tree_could_trap_p ())

in the loops.

Ok with that change and a testcase that would exercise failure/ICE of
store chains w/ unrolling.

Thanks,
Richard.

> Thanks,
> bin
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> bin
>>> 2017-06-21  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         * tree-predcom.c: Revise general description of pass.
>>>         (enum chain_type): New enum type for store elimination.
>>>         (struct chain): New field supporting store elimination.
>>>         (dump_chain): Dump store-stores chain.
>>>         (release_chain): Release resources.
>>>         (split_data_refs_to_components): Compute and create component
>>>         contains only stores for elimination.
>>>         (get_chain_last_ref_at): New function.
>>>         (make_invariant_chain): Initialization.
>>>         (make_rooted_chain): Specify chain type in parameter.
>>>         (add_looparound_copies): Skip for store-stores chain.
>>>         (determine_roots_comp): Compute type of chain and pass it to
>>>         make_rooted_chain.
>>>         (initialize_root_vars_store_elim_2): New function.
>>>         (finalize_eliminated_stores): New function.
>>>         (remove_stmt): Handle store for elimination.
>>>         (execute_pred_commoning_chain): Execute predictive commoning on
>>>         store-store chains.
>>>         (determine_unroll_factor): Skip unroll for store-stores chain.
>>>         (prepare_initializers_chain_store_elim): New function.
>>>         (prepare_initializers_chain): Hanlde store-store chain.
>>>         (prepare_finalizers_chain, prepare_finalizers): New function.
>>>         (tree_predictive_commoning_loop): Return integer value indicating
>>>         if loop is unrolled or lcssa form is corrupted.
>>>         (tree_predictive_commoning): Rewrite for lcssa form if necessary.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2017-06-21  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         * gcc.dg/tree-ssa/predcom-dse-1.c: New test.
>>>         * gcc.dg/tree-ssa/predcom-dse-2.c: New test.
>>>         * gcc.dg/tree-ssa/predcom-dse-3.c: New test.
>>>         * gcc.dg/tree-ssa/predcom-dse-4.c: New test.
>>>         * gcc.dg/tree-ssa/predcom-dse-5.c: New test.
>>>         * gcc.dg/tree-ssa/predcom-dse-6.c: New test.
>>>         * gcc.dg/tree-ssa/predcom-dse-7.c: New test.
>>>         * gcc.dg/tree-ssa/predcom-dse-8.c: New test.
>>>         * gcc.dg/tree-ssa/predcom-dse-9.c: New test.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]