[PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

Jeff Law law@redhat.com
Tue Jan 17 14:42:00 GMT 2017


On 01/17/2017 02:15 AM, Richard Biener wrote:
> On Mon, Jan 16, 2017 at 11:36 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>>> On 01/16/2017 01:51 AM, Richard Biener wrote:
>>>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> At one time I know I had the max_size == size test in
>>> valid_ao_ref_for_dse.
>>>>> But it got lost at some point.  This is what caused the Ada failure.
>>>>>
>>>>> Technically it'd be OK for the potentially dead store to have a
>>> variable
>>>>> size as long as the later stores covered the entire range of the
>>> potentially
>>>>> dead store.  I doubt this happens enough to be worth checking.
>>>>>
>>>>> The ppc64 big endian failures were more interesting.  We had this in
>>> the IL:
>>>>>
>>>>> memmove (dst, src, 0)
>>>>>
>>>>> The trimming code assumes that there's at least one live byte in the
>>> store,
>>>>> which obviously isn't the case here.  The net result is we compute
>>> an
>>>>> incorrect trim and the copy goes wild with incorrect addresses and
>>> lengths.
>>>>> This is trivial to fix by validating that the store has a nonzero
>>> length.
>>>>>
>>>>> I was a bit curious how often this happened in practice because such
>>> a call
>>>>> is trivially dead.  ~80 during a bootstrap and a few dozen in the
>>> testsuite.
>>>>> Given how trivial it is to detect and optimize, this patch includes
>>> removal
>>>>> of such calls.  This hunk makes the check for zero size in
>>>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if
>>> we add
>>>>> more builtin support without filtering zero size we'd regress again.
>>>>
>>>> Interesting - we do fold memset (..., 0) away so this means we either
>>>> have an unfolded memset stmt in the IL before DSE.
>>> It's actually exposed by fre3, both in the original test and in the
>>> reduced testcase.  In the reduced testcase we have this just prior to
>>> FRE3:
>>>
>>> ;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
>>> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
>>>   _3 = MEM[(const struct vec *)_4].m_num;
>>>   if (_3 != 0)
>>>     goto <bb 4>; [36.64%]
>>>   else
>>>     goto <bb 5>; [63.36%]
>>> ;;    succ:       4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>> ;;                5 [63.4%]  (FALSE_VALUE,EXECUTABLE)
>>>
>>> ;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>   _6 = vec_av_set.m_vec;
>>>   _7 = _6->m_num;
>>>   _8 = _7 - _3;
>>>   _6->m_num = _8;
>>>   _9 = (long unsigned int) _8;
>>>   _10 = _9 * 4;
>>>   slot.2_11 = slot;
>>>   dest.3_12 = dest;
>>>   memmove (dest.3_12, slot.2_11, _10);
>>> ;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)
>>>
>>>
>>> _3 has the value _6->m_num.  Thus _8 will have the value 0, which in
>>> turn makes _10 have the value zero as seen in the .fre3 dump:
>>>
>>> ;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>> ;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>   _4->m_num = 0;
>>>   slot.2_11 = slot;
>>>   dest.3_12 = dest;
>>>   memmove (dest.3_12, slot.2_11, 0);
>>>
>>> In the full test its similar.
>>>
>>> I don't know if you want to try and catch this in FRE though.
>>
>> Ah, I think I have patches for this since a long time in my tree...  We're folding calls in a restricted way for some historical reason.
>>
>>> If we detect in DCE (where it makes reasonable sense) rather than DSE,
>>> then we detect the dead mem* about 17 passes earlier and the dead
>>> argument setup about 20 passes earlier.  In the testcase I looked at, I
>>>
>>> didn't see additional secondary optimizations enabled, but I could
>>> imagine cases where it might.  Seems like a gcc-8 thing though.
>>
>> I'll give it a quick look tomorrow.
>
> The comment before fold_stmt_inplace no longer applies (but it seems I
> simply forgot to push
> this change...).  It's better to not keep unfolded stmts around, so
> I'll commit this as last bit
> of stage3 if testing is fine.
>
> Bootstrap / regtest on x86_64-unknown-linux-gnu in progress.
>
> Richard.
>
> 2017-01-17  Richard Biener  <rguenther@suse.de>
>
>         * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
>         Fold calls regularly.
>
>         * gcc.dg/tree-ssa/ssa-fre-57.c: New testcase.
>
Note you'll need the trivial update to the new ssa-dse testcase as it 
verifies removal of the dead memmove.

jeff



More information about the Gcc-patches mailing list