This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)


On 01/04/2017 07:11 AM, Richard Biener wrote:
On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
On 12/20/2016 09:16 AM, Richard Biener wrote:

On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com> wrote:

Hi folks.

This is a follow-up on Jeff and Richi's interaction on the aforementioned
PR
here:

        https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html

I decided to explore the idea of analyzing may-undefness on-demand, which
actually looks rather cheap.

BTW, I don't understand why we don't have auto_bitmap's, as we already
have
auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code
we
already have.

The attached patch fixes the bug without introducing any regressions.

I also tested the patch by compiling 242 .ii files with -O3.  These were
gathered from a stage1 build with -save-temps.  There is a slight time
degradation of 4 seconds within 27 minutes of user time:

        tainted:        26:52
        orig:           26:48

This was the average aggregate time of two runs compiling all 242 .ii
files.
IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?


+  while (!worklist.is_empty ())
+    {
+      name = worklist.pop ();
+      gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+      if (ssa_undefined_value_p (name, true))
+       return true;
+
+      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

it should be already set as we use visited_ssa as "was it ever on the
worklist",
so maybe renaming it would be a good thing as well.


I don't understand what you would prefer here.

Set the bit when you put name on the worklist (and do not do that if the
bit is set).  Thus simply remove the above and add a bitmap_set_bit
for the initial name you put on the worklist.


+             if (TREE_CODE (name) == SSA_NAME)
+               {
+                 /* If an SSA has already been seen, this may be a loop.
+                    Fail conservatively.  */
+                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+                   return false;

so to me "conservative" is returning true, not false.


OK


+                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+                 worklist.safe_push (name);

but for loops we can just continue and ignore this use.  And
bitmap_set_bit
returns whether it set a bit, thus

                if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
                  worklist.safe_push (name);

should work?


Fixed.


+      /* Check that any SSA names used to define NAME is also fully
+        defined.  */
+      use_operand_p use_p;
+      ssa_op_iter iter;
+      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+       {
+         name = USE_FROM_PTR (use_p);
+         if (TREE_CODE (name) == SSA_NAME)

always true.

+           {
+             /* If an SSA has already been seen, this may be a loop.
+                Fail conservatively.  */
+             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+               return false;
+             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+             worklist.safe_push (name);

See above.

In principle the thing is sound but I'd like to be able to pass in a
bitmap of
known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).


Done, though bitmaps are now calculated as part of the instantiation.


Also once you hit defs that are in a post-dominated region of the loop
entry
you can treat them as not undefined (as their use invokes undefined
behavior).  This is also how you treat function parameters (well,
ssa_undefined_value_p does), where the call site invokes undefined
behavior
when passing in undefined values.  So we need an extra parameter
specifying
the post-dominance region.


Done.


You do not handle memory or calls conservatively which means the existing
testcase only needs some obfuscation to become a problem again.  To fix
that before /* Check that any SSA names used to define NAME is also fully
defined.  */ bail out conservatively, like

   if (! is_gimple_assign (def)
      || gimple_assign_single_p (def))
    return true;


As I asked previously, I understand the !is_gimple_assign, which will skip
over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) ==
true"??

Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we
want to follow up the def chain precisely on these?

The attached implementation uses a cache, and a pre-computed post-dominance
region.  A previous incantation of this patch (sans the post-dominance
stuff) had performance within the noise of the previous implementation.

I am testing again, and will do some performance comparisons in a bit, but
for now-- are we on the same page here?  Is this what you wanted?

+      /* DEFs in the post-dominated region of the loop entry invoke
+        undefined behavior.  Adding any use won't make things any
+        worse.  */
+      for (unsigned i = 0; i < postdom.length (); ++i)
+       if (def->bb == postdom[i])

gimple_bb (def)

+         {
+           set_defined (t);
+           return false;
+         }

I think you can't really return false here but need to continue processing
the worklist.  I'd say rather than a linear walk you can as well use
dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry
block instead?

Note that the way you compute post-dominators doesn't work -- nothing
keeps them up-to-date when unswitching does CFG manipulations :/
Fortunately we have a way to compute them per region, see how tree-if-conv.c
does that (build_region, calculate_dominance_info_for_region).

If I understand correctly, we could compute them per region in tree_unswitch_single_loop() for each individual loop with what tree-if-conv.c uses.

I could certainly abstract build_region/calculate_dominance_info_for_region into something new, say calculate_dominance_info_for_loop_region(). Then we could use dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you suggest.

Question... computing the post-dom tree per region as I've just described will only create post-dom information for the loop region, which means that any definition outside of the loop will have zero dominance info.

What if the definition in question post-dominates the loop entry but is outside/after of the loop? We will have no post-dom information for this. In this case, could we just ignore and continue evaluating the SSA_NAME with the rest of our heuristics, or did you have something else in mind? Another option would be to calculate the post-dom information for the entire function on every loop (calculate_dominance_info(CDI_POST_DOMINATORS)), but that seems rather expensive.

As a side note, at this point I just want to fix/close the PR in an acceptable manner, not come up with the end-all catch-all most-efficient patch for an unlikely scenario ;-).

Thanks for taking the time to review and offer suggestions here.

Aldy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]