[PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

Jeff Law law@redhat.com
Mon Jan 16 18:27:00 GMT 2017


On 01/16/2017 01:51 AM, Richard Biener wrote:
> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <law@redhat.com> wrote:
>>
>> At one time I know I had the max_size == size test in valid_ao_ref_for_dse.
>> But it got lost at some point.  This is what caused the Ada failure.
>>
>> Technically it'd be OK for the potentially dead store to have a variable
>> size as long as the later stores covered the entire range of the potentially
>> dead store.  I doubt this happens enough to be worth checking.
>>
>> The ppc64 big endian failures were more interesting.  We had this in the IL:
>>
>> memmove (dst, src, 0)
>>
>> The trimming code assumes that there's at least one live byte in the store,
>> which obviously isn't the case here.  The net result is we compute an
>> incorrect trim and the copy goes wild with incorrect addresses and lengths.
>> This is trivial to fix by validating that the store has a nonzero length.
>>
>> I was a bit curious how often this happened in practice because such a call
>> is trivially dead.  ~80 during a bootstrap and a few dozen in the testsuite.
>> Given how trivial it is to detect and optimize, this patch includes removal
>> of such calls.  This hunk makes the check for zero size in
>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if we add
>> more builtin support without filtering zero size we'd regress again.
>
> Interesting - we do fold memset (..., 0) away so this means we either
> have an unfolded memset stmt in the IL before DSE.
It's actually exposed by fre3, both in the original test and in the 
reduced testcase.  In the reduced testcase we have this just prior to FRE3:

;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;    pred:       2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
   _3 = MEM[(const struct vec *)_4].m_num;
   if (_3 != 0)
     goto <bb 4>; [36.64%]
   else
     goto <bb 5>; [63.36%]
;;    succ:       4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
;;                5 [63.4%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
   _6 = vec_av_set.m_vec;
   _7 = _6->m_num;
   _8 = _7 - _3;
   _6->m_num = _8;
   _9 = (long unsigned int) _8;
   _10 = _9 * 4;
   slot.2_11 = slot;
   dest.3_12 = dest;
   memmove (dest.3_12, slot.2_11, _10);
;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)


_3 has the value _6->m_num.  Thus _8 will have the value 0, which in 
turn makes _10 have the value zero as seen in the .fre3 dump:

;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
   _4->m_num = 0;
   slot.2_11 = slot;
   dest.3_12 = dest;
   memmove (dest.3_12, slot.2_11, 0);

In the full test its similar.

I don't know if you want to try and catch this in FRE though.

If we detect in DCE (where it makes reasonable sense) rather than DSE, 
then we detect the dead mem* about 17 passes earlier and the dead 
argument setup about 20 passes earlier.  In the testcase I looked at, I 
didn't see additional secondary optimizations enabled, but I could 
imagine cases where it might.  Seems like a gcc-8 thing though.

Jeff




More information about the Gcc-patches mailing list