This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)


On 12/20/2016 09:16 AM, Richard Biener wrote:
On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
Hi folks.

This is a follow-up on Jeff and Richi's interaction on the aforementioned PR
here:

        https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html

I decided to explore the idea of analyzing may-undefness on-demand, which
actually looks rather cheap.

BTW, I don't understand why we don't have auto_bitmap's, as we already have
auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code we
already have.

The attached patch fixes the bug without introducing any regressions.

I also tested the patch by compiling 242 .ii files with -O3.  These were
gathered from a stage1 build with -save-temps.  There is a slight time
degradation of 4 seconds within 27 minutes of user time:

        tainted:        26:52
        orig:           26:48

This was the average aggregate time of two runs compiling all 242 .ii files.
IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?

+  while (!worklist.is_empty ())
+    {
+      name = worklist.pop ();
+      gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+      if (ssa_undefined_value_p (name, true))
+       return true;
+
+      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

it should be already set as we use visited_ssa as "was it ever on the worklist",
so maybe renaming it would be a good thing as well.

I don't understand what you would prefer here.


+             if (TREE_CODE (name) == SSA_NAME)
+               {
+                 /* If an SSA has already been seen, this may be a loop.
+                    Fail conservatively.  */
+                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+                   return false;

so to me "conservative" is returning true, not false.

OK


+                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+                 worklist.safe_push (name);

but for loops we can just continue and ignore this use.  And bitmap_set_bit
returns whether it set a bit, thus

                if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
                  worklist.safe_push (name);

should work?

Fixed.


+      /* Check that any SSA names used to define NAME is also fully
+        defined.  */
+      use_operand_p use_p;
+      ssa_op_iter iter;
+      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+       {
+         name = USE_FROM_PTR (use_p);
+         if (TREE_CODE (name) == SSA_NAME)

always true.

+           {
+             /* If an SSA has already been seen, this may be a loop.
+                Fail conservatively.  */
+             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+               return false;
+             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+             worklist.safe_push (name);

See above.

In principle the thing is sound but I'd like to be able to pass in a bitmap of
known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).

Done, though bitmaps are now calculated as part of the instantiation.


Also once you hit defs that are in a post-dominated region of the loop entry
you can treat them as not undefined (as their use invokes undefined
behavior).  This is also how you treat function parameters (well,
ssa_undefined_value_p does), where the call site invokes undefined behavior
when passing in undefined values.  So we need an extra parameter specifying
the post-dominance region.

Done.


You do not handle memory or calls conservatively which means the existing
testcase only needs some obfuscation to become a problem again.  To fix
that before /* Check that any SSA names used to define NAME is also fully
defined.  */ bail out conservatively, like

   if (! is_gimple_assign (def)
      || gimple_assign_single_p (def))
    return true;

As I asked previously, I understand the !is_gimple_assign, which will skip over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) == true"??

Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we want to follow up the def chain precisely on these?

The attached implementation uses a cache, and a pre-computed post-dominance region. A previous incantation of this patch (sans the post-dominance stuff) had performance within the noise of the previous implementation.

I am testing again, and will do some performance comparisons in a bit, but for now-- are we on the same page here? Is this what you wanted?

Aldy

p.s. I could turn the post-dominance region into a bitmap for faster access if preferred.

Attachment: curr
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]