This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)


On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>
>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>
>>> Hi folks.
>>>
>>> This is a follow-up on Jeff and Richi's interaction on the aforementioned
>>> PR
>>> here:
>>>
>>>         https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>
>>> I decided to explore the idea of analyzing may-undefness on-demand, which
>>> actually looks rather cheap.
>>>
>>> BTW, I don't understand why we don't have auto_bitmap's, as we already
>>> have
>>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code
>>> we
>>> already have.
>>>
>>> The attached patch fixes the bug without introducing any regressions.
>>>
>>> I also tested the patch by compiling 242 .ii files with -O3.  These were
>>> gathered from a stage1 build with -save-temps.  There is a slight time
>>> degradation of 4 seconds within 27 minutes of user time:
>>>
>>>         tainted:        26:52
>>>         orig:           26:48
>>>
>>> This was the average aggregate time of two runs compiling all 242 .ii
>>> files.
>>> IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?
>>
>>
>> +  while (!worklist.is_empty ())
>> +    {
>> +      name = worklist.pop ();
>> +      gcc_assert (TREE_CODE (name) == SSA_NAME);
>> +
>> +      if (ssa_undefined_value_p (name, true))
>> +       return true;
>> +
>> +      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>
>> it should be already set as we use visited_ssa as "was it ever on the
>> worklist",
>> so maybe renaming it would be a good thing as well.
>
>
> I don't understand what you would prefer here.

Set the bit when you put name on the worklist (and do not do that if the
bit is set).  Thus simply remove the above and add a bitmap_set_bit
for the initial name you put on the worklist.

>>
>> +             if (TREE_CODE (name) == SSA_NAME)
>> +               {
>> +                 /* If an SSA has already been seen, this may be a loop.
>> +                    Fail conservatively.  */
>> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>> +                   return false;
>>
>> so to me "conservative" is returning true, not false.
>
>
> OK
>
>>
>> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>> +                 worklist.safe_push (name);
>>
>> but for loops we can just continue and ignore this use.  And
>> bitmap_set_bit
>> returns whether it set a bit, thus
>>
>>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
>>                   worklist.safe_push (name);
>>
>> should work?
>
>
> Fixed.
>
>>
>> +      /* Check that any SSA names used to define NAME is also fully
>> +        defined.  */
>> +      use_operand_p use_p;
>> +      ssa_op_iter iter;
>> +      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>> +       {
>> +         name = USE_FROM_PTR (use_p);
>> +         if (TREE_CODE (name) == SSA_NAME)
>>
>> always true.
>>
>> +           {
>> +             /* If an SSA has already been seen, this may be a loop.
>> +                Fail conservatively.  */
>> +             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>> +               return false;
>> +             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>> +             worklist.safe_push (name);
>>
>> See above.
>>
>> In principle the thing is sound but I'd like to be able to pass in a
>> bitmap of
>> known maybe-undefined/must-defined SSA names to have a cache for
>> multiple invocations from within a pass (like this unswitching case).
>
>
> Done, though bitmaps are now calculated as part of the instantiation.
>
>>
>> Also once you hit defs that are in a post-dominated region of the loop
>> entry
>> you can treat them as not undefined (as their use invokes undefined
>> behavior).  This is also how you treat function parameters (well,
>> ssa_undefined_value_p does), where the call site invokes undefined
>> behavior
>> when passing in undefined values.  So we need an extra parameter
>> specifying
>> the post-dominance region.
>
>
> Done.
>
>>
>> You do not handle memory or calls conservatively which means the existing
>> testcase only needs some obfuscation to become a problem again.  To fix
>> that before /* Check that any SSA names used to define NAME is also fully
>> defined.  */ bail out conservatively, like
>>
>>    if (! is_gimple_assign (def)
>>       || gimple_assign_single_p (def))
>>     return true;
>
>
> As I asked previously, I understand the !is_gimple_assign, which will skip
> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) ==
> true"??
>
> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we
> want to follow up the def chain precisely on these?
>
> The attached implementation uses a cache, and a pre-computed post-dominance
> region.  A previous incantation of this patch (sans the post-dominance
> stuff) had performance within the noise of the previous implementation.
>
> I am testing again, and will do some performance comparisons in a bit, but
> for now-- are we on the same page here?  Is this what you wanted?

+      /* DEFs in the post-dominated region of the loop entry invoke
+        undefined behavior.  Adding any use won't make things any
+        worse.  */
+      for (unsigned i = 0; i < postdom.length (); ++i)
+       if (def->bb == postdom[i])

gimple_bb (def)

+         {
+           set_defined (t);
+           return false;
+         }

I think you can't really return false here but need to continue processing
the worklist.  I'd say rather than a linear walk you can as well use
dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
block instead?

Note that the way you compute post-dominators doesn't work -- nothing
keeps them up-to-date when unswitching does CFG manipulations :/
Fortunately we have a way to compute them per region, see how tree-if-conv.c
does that (build_region, calculate_dominance_info_for_region).

+      /* Handle memory and calls conservatively.  */
+      if (!is_gimple_assign (def)
+         /* ?? Do we really want this?  The code below will return
+            TRUE for something like "e.3_7 = e", which is probably
+            not what we want.
+         || gimple_assign_single_p (def)

as said for loads it _is_ what we want (the memory may be uninitialized).
You can make it

     || (gimple_assign_single_p (def)
         && gimple_vuse (def))

to better catch that though (stop at memory loads only).

--

The way you add things to the cache (or not) is somewhat odd probably
given that you don't really compute for anything but the arg to
is_maybe_undefined
whether it is maybe undefined or not?

I am missing a cache check in the worklist processing at least but I also don't
see how common "tails" in SSA use trees of different names are reliably
cached (and I'm not sure it's easily possible to populate the cache for all
visited SSA names with conservative values -- a recursive implementation
would have made that obvious but with a worklist it might be tricky).

Given the way it is used from unswitching and given (citing from it):

  /* Condition must be invariant.  */
  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
    {
      /* Unswitching on undefined values would introduce undefined
         behavior that the original program might never exercise.  */
      if (ssa_undefined_value_p (use, true))
        return NULL_TREE;
      def = SSA_NAME_DEF_STMT (use);
      def_bb = gimple_bb (def);
      if (def_bb
          && flow_bb_inside_loop_p (loop, def_bb))
^^^

we do not unswitch on exprs that are defined inside the loop at all, thus all
SSA use-def walking is pointless here (I think in your worklist processing
you miss that you can stop walking when you see a (non-PHI) def that
dominates the post-dom region entry).  The above also suggests to do that
flow_bb_inside_loop_p check first, eventually even for all stmt operands.


Thanks,
Richard.

> Aldy
>
> p.s. I could turn the post-dominance region into a bitmap for faster access
> if preferred.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]