[PR63185][RFC] Improve DSE with branches
Jeff Law
law@redhat.com
Thu May 17 01:40:00 GMT 2018
On 05/16/2018 04:12 AM, Richard Biener wrote:
> On Tue, 15 May 2018, Richard Biener wrote:
>
>> On Tue, 15 May 2018, Richard Biener wrote:
>>
>>> On Tue, 15 May 2018, Richard Biener wrote:
>>>
>>>> On Tue, 15 May 2018, Richard Biener wrote:
>>>>
>>>>> On Tue, 15 May 2018, Richard Biener wrote:
>>>>>
>>>>>> On Mon, 14 May 2018, Kugan Vivekanandarajah wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Attached patch handles PR63185 when we reach PHI with temp != NULLL.
>>>>>>> We could see the PHI and if there isn't any uses for PHI that is
>>>>>>> interesting, we could ignore that ?
>>>>>>>
>>>>>>> Bootstrapped and regression tested on x86_64-linux-gnu.
>>>>>>> Is this OK?
>>>>>>
>>>>>> No, as Jeff said we can't do it this way.
>>>>>>
>>>>>> If we end up with multiple VDEFs in the walk of defvar immediate
>>>>>> uses we know we are dealing with a CFG fork. We can't really
>>>>>> ignore any of the paths but we have to
>>>>>>
>>>>>> a) find the merge point (and the associated VDEF)
>>>>>> b) verify for each each chain of VDEFs with associated VUSEs
>>>>>> up to that merge VDEF that we have no uses of the to classify
>>>>>> store and collect (partial) kills
>>>>>> c) intersect kill info and continue walking from the merge point
>>>>>>
>>>>>> in b) there's the optional possibility to find sinking opportunities
>>>>>> in case we have kills on some paths but uses on others. This is why
>>>>>> DSE should be really merged with (store) sinking.
>>>>>>
>>>>>> So if we want to enhance DSEs handling of branches then we need
>>>>>> to refactor the simple dse_classify_store function. Let me take
>>>>>> an attempt at this today.
>>>>>
>>>>> First (baby) step is the following - it arranges to collect the
>>>>> defs we need to continue walking from and implements trivial
>>>>> reduction by stopping at (full) kills. This allows us to handle
>>>>> the new testcase (which was already handled in the very late DSE
>>>>> pass with the help of sinking the store).
>>>>>
>>>>> I took the opportunity to kill the use_stmt parameter of
>>>>> dse_classify_store as the only user is only looking for whether
>>>>> the kills were all clobbers which I added a new parameter for.
>>>>>
>>>>> I didn't adjust the byte-tracking case fully (I'm not fully understanding
>>>>> the code in the case of a use and I'm not sure whether it's worth
>>>>> doing the def reduction with byte-tracking).
>>>>>
>>>>> Your testcase can be handled by reducing the PHI and the call def
>>>>> by seeing that the only use of a candidate def is another def
>>>>> we have already processed. Not sure if worth special-casing though,
>>>>> I'd rather have a go at "recursing". That will be the next
>>>>> patch.
>>>>>
>>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>>>>
>>>> Applied.
>>>>
>>>> Another intermediate one below, fixing the byte-tracking for
>>>> stmt with uses. This also re-does the PHI handling by simply
>>>> avoiding recursion by means of a visited bitmap and stopping
>>>> at the DSE classify stmt when re-visiting it instead of failing.
>>>> This covers Pratamesh loop case for which I added ssa-dse-33.c.
>>>> For the do-while loop this still runs into the inability to
>>>> handle two defs to walk from.
>>>>
>>>> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>>>
>>> Ok, loop handling doesn't work in general since we run into the
>>> issue that SSA form across the backedge is not representing the
>>> same values. Consider
>>>
>>> <bb 3>
>>> # .MEM_22 = PHI <.MEM_12(D)(2), .MEM_13(4)>
>>> # n_20 = PHI <0(2), n_7(4)>
>>> # .MEM_13 = VDEF <.MEM_22>
>>> bytes[n_20] = _4;
>>> if (n_20 > 7)
>>> goto <bb 5>;
>>>
>>> <bb 4>
>>> n_7 = n_20 + 1;
>>> # .MEM_15 = VDEF <.MEM_13>
>>> bytes[n_20] = _5;
>>> goto <bb 3>;
>>>
>>> then when classifying the store in bb4, visiting the PHI node
>>> gets us to the store in bb3 which appears to be killing.
>>>
>>> if (gimple_code (temp) == GIMPLE_PHI)
>>> - defvar = PHI_RESULT (temp);
>>> + {
>>> + /* If we visit this PHI by following a backedge then reset
>>> + any info in ref that may refer to SSA names which we'd need
>>> + to PHI translate. */
>>> + if (gimple_bb (temp) == gimple_bb (stmt)
>>> + || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
>>> + gimple_bb (temp)))
>>> + /* ??? ref->ref may not refer to SSA names or it may only
>>> + refer to SSA names that are invariant with respect to the
>>> + loop represented by this PHI node. */
>>> + ref->ref = NULL_TREE;
>>> + defvar = PHI_RESULT (temp);
>>> + bitmap_set_bit (visited, SSA_NAME_VERSION (defvar));
>>> + }
>>>
>>> should be a workable solution for that. I'm checking that, but
>>> eventually you can think of other things that might prevent us from
>>> handling backedges. Note the current code tries to allow
>>> looking across loops but not handle backedges of loops the
>>> original stmt belongs to.
>>
>> Just to mention before I leave for the day I think I've identified
>> a latent issue where I just fail to produce a testcase right now
>> which is that non-return exits from a function are not reliably
>> visited given they do not all have virtual operands, like
>> for example resx. Thus consider
>>
>> void foo (int *p, int x)
>> {
>> *p = 0;
>> if (x)
>> resx;
>> *p = 1;
>> return;
>> }
>>
>> where we will never visit any stmts on the resx path and thus think
>> the *p = 0 store is dead (all with current trunk of course).
>>
>> Will continue to dig tomorrow.
>
> PR85803.
>
> I am currently re-testing the following on x86_64-unknown-linux-gnu
> with --param dse-max-alias-queries-per-store=100000 in BOOT_CFLAGS.
>
> I'll refrain from doing the separate-walks-for-each-def thing
> but will try to fix PR85757 by pattern matching the
> def-has-single-virtual-use-that-is-another-def-in-the-vector case
> in which case we can remove it from further processing.
>
> Richard.
>
> 2018-05-16 Richard Biener <rguenther@suse.de>
>
> * params.def (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE): New param.
> * doc/invoke.texi (dse-max-alias-queries-per-store): Document.
> * tree-ssa-dse.c: Include tree-ssa-loop.h.
> (check_name): New callback.
> (dse_classify_store): Track cycles via a visited bitmap of PHI
> defs and simplify handling of in-loop and across loop dead stores
> and properly fail for loop-variant refs. Handle byte-tracking with
> multiple defs. Use PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE for
> limiting the walk.
>
> * gcc.dg/tree-ssa/ssa-dse-32.c: New testcase.
> * gcc.dg/tree-ssa/ssa-dse-33.c: Likewise.
Seems quite reasonable as well.
jeff
More information about the Gcc-patches
mailing list