[PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

Wed Jan 18 12:45:00 GMT 2017

On Tue, Jan 17, 2017 at 3:42 PM, Jeff Law <law@redhat.com> wrote:
> On 01/17/2017 02:15 AM, Richard Biener wrote:
>>
>> On Mon, Jan 16, 2017 at 11:36 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>>
>>> On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <law@redhat.com>
>>> wrote:
>>>>
>>>> On 01/16/2017 01:51 AM, Richard Biener wrote:
>>>>>
>>>>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <law@redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>> At one time I know I had the max_size == size test in
>>>>
>>>> valid_ao_ref_for_dse.
>>>>>>
>>>>>> But it got lost at some point.  This is what caused the Ada failure.
>>>>>>
>>>>>> Technically it'd be OK for the potentially dead store to have a
>>>>
>>>> variable
>>>>>>
>>>>>> size as long as the later stores covered the entire range of the
>>>>
>>>> potentially
>>>>>>
>>>>>> dead store.  I doubt this happens enough to be worth checking.
>>>>>>
>>>>>> The ppc64 big endian failures were more interesting.  We had this in
>>>>
>>>> the IL:
>>>>>>
>>>>>>
>>>>>> memmove (dst, src, 0)
>>>>>>
>>>>>> The trimming code assumes that there's at least one live byte in the
>>>>
>>>> store,
>>>>>>
>>>>>> which obviously isn't the case here.  The net result is we compute
>>>>
>>>> an
>>>>>>
>>>>>> incorrect trim and the copy goes wild with incorrect addresses and
>>>>
>>>> lengths.
>>>>>>
>>>>>> This is trivial to fix by validating that the store has a nonzero
>>>>
>>>> length.
>>>>>>
>>>>>>
>>>>>> I was a bit curious how often this happened in practice because such
>>>>
>>>> a call
>>>>>>
>>>>>> is trivially dead.  ~80 during a bootstrap and a few dozen in the
>>>>
>>>> testsuite.
>>>>>>
>>>>>> Given how trivial it is to detect and optimize, this patch includes
>>>>
>>>> removal
>>>>>>
>>>>>> of such calls.  This hunk makes the check for zero size in
>>>>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if
>>>>
>>>> we add
>>>>>>
>>>>>> more builtin support without filtering zero size we'd regress again.
>>>>>
>>>>>
>>>>> Interesting - we do fold memset (..., 0) away so this means we either
>>>>> have an unfolded memset stmt in the IL before DSE.
>>>>
>>>> It's actually exposed by fre3, both in the original test and in the
>>>> reduced testcase.  In the reduced testcase we have this just prior to
>>>> FRE3:
>>>>
>>>> ;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
>>>> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
>>>> ;;    pred:       2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
>>>>   _3 = MEM[(const struct vec *)_4].m_num;
>>>>   if (_3 != 0)
>>>>     goto <bb 4>; [36.64%]
>>>>   else
>>>>     goto <bb 5>; [63.36%]
>>>> ;;    succ:       4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>> ;;                5 [63.4%]  (FALSE_VALUE,EXECUTABLE)
>>>>
>>>> ;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>>> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>>> ;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>>   _6 = vec_av_set.m_vec;
>>>>   _7 = _6->m_num;
>>>>   _8 = _7 - _3;
>>>>   _6->m_num = _8;
>>>>   _9 = (long unsigned int) _8;
>>>>   _10 = _9 * 4;
>>>>   slot.2_11 = slot;
>>>>   dest.3_12 = dest;
>>>>   memmove (dest.3_12, slot.2_11, _10);
>>>> ;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)
>>>>
>>>>
>>>> _3 has the value _6->m_num.  Thus _8 will have the value 0, which in
>>>> turn makes _10 have the value zero as seen in the .fre3 dump:
>>>>
>>>> ;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>>>> ;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>>>> ;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>>>>   _4->m_num = 0;
>>>>   slot.2_11 = slot;
>>>>   dest.3_12 = dest;
>>>>   memmove (dest.3_12, slot.2_11, 0);
>>>>
>>>> In the full test its similar.
>>>>
>>>> I don't know if you want to try and catch this in FRE though.
>>>
>>>
>>> Ah, I think I have patches for this since a long time in my tree...
>>> We're folding calls in a restricted way for some historical reason.
>>>
>>>> If we detect in DCE (where it makes reasonable sense) rather than DSE,
>>>> then we detect the dead mem* about 17 passes earlier and the dead
>>>> argument setup about 20 passes earlier.  In the testcase I looked at, I
>>>>
>>>> didn't see additional secondary optimizations enabled, but I could
>>>> imagine cases where it might.  Seems like a gcc-8 thing though.
>>>
>>>
>>> I'll give it a quick look tomorrow.
>>
>>
>> The comment before fold_stmt_inplace no longer applies (but it seems I
>> simply forgot to push
>> this change...).  It's better to not keep unfolded stmts around, so
>> I'll commit this as last bit
>> of stage3 if testing is fine.
>>
>> Bootstrap / regtest on x86_64-unknown-linux-gnu in progress.
>>
>> Richard.
>>
>> 2017-01-17  Richard Biener  <rguenther@suse.de>
>>
>>         * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
>>         Fold calls regularly.
>>
>>         * gcc.dg/tree-ssa/ssa-fre-57.c: New testcase.
>>
> Note you'll need the trivial update to the new ssa-dse testcase as it
> verifies removal of the dead memmove.

Yep, the patch had other fallout so I'm postponing it for GCC 8.

Richard.

> jeff
>