This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: "Bin.Cheng" <amker dot cheng at gmail dot com>
- Cc: Bin Cheng <Bin dot Cheng at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 4 Jul 2017 13:19:39 +0200
- Subject: Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom
- Authentication-results: sourceware.org; auth=none
- References: <VI1PR0802MB217683AA9B0CFC24B6849ACAE7DC0@VI1PR0802MB2176.eurprd08.prod.outlook.com> <CAFiYyc0WfFFcB+epL7mciVc29dmrmsWN5Gja8-eGt0dJtpR_rA@mail.gmail.com> <CAHFci2_xmxAVdYFvctf9fKGvpM5ZpOfv6ru310wmhBsQ-Neqcw@mail.gmail.com>
On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> For the moment, tree-predcom.c only supports invariant/load-loads/store-loads chains.
>>> This patch generalizes dead store elimination (or store motion) across loop iterations in
>>> predictive commoning pass by supporting store-store chain. As comment in the patch:
>>>
>>> Apart from predictive commoning on Load-Load and Store-Load chains, we
>>> also support Store-Store chains -- stores killed by other store can be
>>> eliminated. Given below example:
>>>
>>> for (i = 0; i < n; i++)
>>> {
>>> a[i] = 1;
>>> a[i+2] = 2;
>>> }
>>>
>>> It can be replaced with:
>>>
>>> t0 = a[0];
>>> t1 = a[1];
>>> for (i = 0; i < n; i++)
>>> {
>>> a[i] = 1;
>>> t2 = 2;
>>> t0 = t1;
>>> t1 = t2;
>>> }
>>> a[n] = t0;
>>> a[n+1] = t1;
>>>
>>> If the loop runs more than 1 iterations, it can be further simplified into:
>>>
>>> for (i = 0; i < n; i++)
>>> {
>>> a[i] = 1;
>>> }
>>> a[n] = 2;
>>> a[n+1] = 2;
>>>
>>> The interesting part is this can be viewed either as general store motion
>>> or general dead store elimination in either intra/inter-iterations way.
>>>
>>> There are number of interesting facts about this enhancement:
>>> a) This patch supports dead store elimination for both across-iteration case and single-iteration
>>> case. For the latter, it is dead store elimination.
>>> b) There are advantages supporting dead store elimination in predcom, for example, it has
>>> complete information about memory address. On the contrary, DSE pass can only handle
>>> memory references with exact the same memory address expression.
>>> c) It's cheap to support store-stores chain in predcom based on existing code.
>>> d) As commented, the enhancement can be viewed as either generalized dead store elimination
>>> or generalized store motion. I prefer DSE here.
>>>
>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64. Is it OK?
>>
>> Looks mostly ok. I have a few questions though.
>>
>> + /* Don't do store elimination if loop has multiple exit edges. */
>> + bool eliminate_store_p = single_exit (loop) != NULL;
>>
>> handling this would be an enhancement? IIRC LIM store-motion handles this
>> just fine by emitting code on all exits.
> It is an enhancement with a little bit more complication. We would
> need to setup/record finalizer memory references for different exit
> edges. I added TODO description for this (and following one). Is it
> okay to pick up this in the future?
Yes.
>>
>> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec<chain_p> chains)
>> {
>> if (chain->type == CT_INVARIANT)
>> continue;
>> + /* Don't unroll when eliminating stores. */
>> + else if (chain->type == CT_STORE_STORE)
>> + return 1;
>>
>> this is a hard exit value so we do not handle the case where another chain
>> in the loop would want to unroll? (enhancement?) I'd have expected to do
>> the same as for CT_INVARIANT here.
> I didn't check what change is needed in case of unrolling. I am not
> very sure if we should prefer unroll for *load chains or prefer not
> unroll for store-store chains, because unroll in general increases
> loop-carried register pressure for store-store chains rather than
> decreases register pressure for *load chains.
> I was also thinking if it's possible to restrict unrolling somehow in
> order to enable predcom at O2. BTW, this is not common, it only
> happens once in spec2k6 with factor forced to 1. So okay if as it is
> now?
I think it is ok for now with a TODO added. Please change the comment
to "we can't handle unrolling when eliminating stores" (it's not clear if we
can -- did you try? maybe add a testcase that would ICE)
>
>>
>> + tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
>> + if (!chain->all_always_accessed && tree_could_trap_p (init))
>> + {
>> + gimple_seq_discard (stmts);
>> + return false;
>>
>> so this is the only place that remotely cares for not always performed stores.
>> But as-is the patch doesn't seem to avoid speculating stores and thus
>> violates the C++ memory model, aka, introduces store-data-races? The LIM
>> store-motion code was fixed to avoid this by keeping track of whether a BB
>> has executed to guard the stores done in the compensation code on the loop
>> exit.
>>
>> That said, to "fix" this all && tree_could_trap_p cases would need to be removed
>> (or similarly flag vars be introduced). Speculating loads that do not
>> trap is ok
>> (might only introduce false uninit use reports by tools like valgrind).
> Hmm, not sure IIUC. Patch updated, is it correct (though conservative)?
+static bool
+prepare_initializers_chain_store_elim (struct loop *, chain_p chain)
+{
...
+ tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
+ if (!chain->all_always_accessed && tree_could_trap_p (init))
+ {
remove && tree_could_trap_p and hoist the other check.
Same in prepare_finalizers_chain. I think you should check
if (! chain->all_always_accessed
&& ! PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
return false;
at the beginning of both functions and retain
if (! chain->all_always_accessed && tree_could_trap_p ())
in the loops.
Ok with that change and a testcase that would exercise failure/ICE of
store chains w/ unrolling.
Thanks,
Richard.
> Thanks,
> bin
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> bin
>>> 2017-06-21 Bin Cheng <bin.cheng@arm.com>
>>>
>>> * tree-predcom.c: Revise general description of pass.
>>> (enum chain_type): New enum type for store elimination.
>>> (struct chain): New field supporting store elimination.
>>> (dump_chain): Dump store-stores chain.
>>> (release_chain): Release resources.
>>> (split_data_refs_to_components): Compute and create component
>>> contains only stores for elimination.
>>> (get_chain_last_ref_at): New function.
>>> (make_invariant_chain): Initialization.
>>> (make_rooted_chain): Specify chain type in parameter.
>>> (add_looparound_copies): Skip for store-stores chain.
>>> (determine_roots_comp): Compute type of chain and pass it to
>>> make_rooted_chain.
>>> (initialize_root_vars_store_elim_2): New function.
>>> (finalize_eliminated_stores): New function.
>>> (remove_stmt): Handle store for elimination.
>>> (execute_pred_commoning_chain): Execute predictive commoning on
>>> store-store chains.
>>> (determine_unroll_factor): Skip unroll for store-stores chain.
>>> (prepare_initializers_chain_store_elim): New function.
>>> (prepare_initializers_chain): Hanlde store-store chain.
>>> (prepare_finalizers_chain, prepare_finalizers): New function.
>>> (tree_predictive_commoning_loop): Return integer value indicating
>>> if loop is unrolled or lcssa form is corrupted.
>>> (tree_predictive_commoning): Rewrite for lcssa form if necessary.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2017-06-21 Bin Cheng <bin.cheng@arm.com>
>>>
>>> * gcc.dg/tree-ssa/predcom-dse-1.c: New test.
>>> * gcc.dg/tree-ssa/predcom-dse-2.c: New test.
>>> * gcc.dg/tree-ssa/predcom-dse-3.c: New test.
>>> * gcc.dg/tree-ssa/predcom-dse-4.c: New test.
>>> * gcc.dg/tree-ssa/predcom-dse-5.c: New test.
>>> * gcc.dg/tree-ssa/predcom-dse-6.c: New test.
>>> * gcc.dg/tree-ssa/predcom-dse-7.c: New test.
>>> * gcc.dg/tree-ssa/predcom-dse-8.c: New test.
>>> * gcc.dg/tree-ssa/predcom-dse-9.c: New test.