[RFA][tree-optimization/90883] Improve DSE to handle redundant calls

Jeff Law law@redhat.com
Wed Jun 26 18:13:00 GMT 2019


On 6/26/19 5:53 AM, Richard Biener wrote:
> On Wed, Jun 26, 2019 at 6:17 AM Jeff Law <law@redhat.com> wrote:
>>
>> So based on the conversation in the BZ I cobbled together a patch to
>> extend tree-ssa-dse.c to also detect redundant stores.
>>
>> To be clear, given two stores, the first store is dead if the later
>> store overwrites all the live bytes set by the first store.   In this
>> case we delete the first store.  If the first store is partially dead we
>> may trim it.
>>
>> Given two stores, the second store is redundant if it stores _the same
>> value_ into locations set by the first store.  In this case we delete
>> the second store.
>>
>>
>> We prefer to remove redundant stores over removing dead or trimming
>> partially dead stores.
>>
>> First, if we detect a redundant store, we can always remove it.  We may
>> not always be able to trim a partially dead store.  So removing the
>> redundant store wins in this case.
>>
>> But even if the redundant store occurs at the head or tail of the prior
>> store, removing the redundant store is better than trimming the
>> partially dead store because we end up with fewer calls to memset with
>> the same number of total bytes written.
>>
>> We only look for redundant stores in a few cases.  The first store must
>> be a memset, empty constructor or calloc call -- ie things which
>> initialize multiple memory locations to zero.  Subsequent stores can
>> occur via memset, empty constructors or simple memory assignments.
>>
>> The chagne to tree-ssa-alias.c deserves a quick note.
>>
>> When we're trying to determine if we have a redundant store, we create
>> an AO_REF for the *second* store, then ask the alias system if the first
>> store would kill the AO_REF.
>>
>> So while normally a calloc wouldn't ever kill anything in normal
>> execution order, we're not asking about things in execution order.  We
>> really just want to know if the calloc is going to write into the
>> entirety of the AO_REF of the subsequent store.  So we compute the size
>> of the allocation and we know the destination from the LHS of the calloc
>> call and everything "just works".
> 
> I see how stmt_kills_ref_p is convenient here and it's the only
> ref-must-include-other-ref kind of query the oracle supports right now.
> Note it is not optimized for your particular case querying the same
> stmt for multiple refs.  It's not refs_must_alias_p that is missing
> but something stronger ('kills' is also wrong since both refs might
> be reads), ref_covered_by_ref_p or so.  That said, factoring
> stmt_kills_ref_p might not be so straight-forward for calls since
> we lack a general ao_ref_init for calls (ao_ref_init_stores_from_call,
> ao_ref_init_loads_from_call?).
> 
> So I think the tree-ssa-alias.c change is fine but please put a
> comment before
> 
> +             if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)
> +               {
> +                 tree arg0 = gimple_call_arg (stmt, 0);
> 
> explaining this is used by DSE to detect redundant stores.
> 
>> This patch also includes a hunk I apparently left out from yesterday's
>> submission which just adds _CHK cases to all the existing BUILT_IN_MEM*
>> cases.  That's what I get for writing this part first, then adding the
>> _CHK stuff afterwards, then reversing the order of submission.
>>
>> This includes a slightly reduced testcase from the BZ in g++.dg -- it's
>> actually a good way to capture when one empty constructor causes another
>> empty constructor to be redundant.  The gcc.dg cases capture other
>> scenarios.
>>
>> This has been bootstrapped and regression tested on x86-64, i686, ppc64,
>> ppc64le, sparc64 & aarch64.  It's also bootstrapped on various arm
>> targets, alpha, m68k, mips, riscv64, sh4.  It's been built and tested on
>> a variety of *-elf targets as well as various *-linux-gnu targets as
>> crosses.  ANd just for giggles it was tested before the changes to add
>> the _CHK support, so it works with and without that as well.
>>
>> OK for the trunk?
> 
> I also notice we wouldn't handle
> 
>    memset(p, 1, 64);
>    memset(p, 1, 32);
or even just *p = 1.

> 
> (non-zero-initializer) or
> 
>   x = {};
>   y = {};
>   x.a = {};
> 
> (intermediate non-aliasing store)
Correct in both cases.  Handling the first wouldn't be terribly hard to
implement, I can probably cobble that together quickly to see if it even
triggers.  THe memset-memset case was the least commonly detected IIRC.

The second would likely require more tracking and some of the other
complexities we handle for dead stores.  Possible?  Certainly, not sure
if it's worth the effort.

> 
> Did you see if / how often this triggers on trunk?
They triggered a couple hundred times.    I didn't initially implement
the calloc case, but found it reported in the llvm bug database.  It
triggered more often than I anticipated.

Jeff



More information about the Gcc-patches mailing list