This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PR80693] drop value of parallel SETs dropped by combine


On Jul  7, 2017, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> I meant, just double check if your new
> code does the correct thing for the set count.

Sorry this took me so long to get back to.  Even this was difficult for
me to answer for sure, then and now.

We don't (and can't?) know whether the REG_UNUSED note originally
pertained to a clobber or a set, but I think that doesn't matter: unless
we've reused the REG in i2 as a scratch, I think we should decrement the
set count, because we used to have a SET or a CLOBBER that is now gone.

Looking back at the issue, I realized we should keep/add the REG_UNUSED
note to i2, if the reg is mentioned in i2, possibly turned into REG_DEAD
if i2 is referenced but not set.

Still, I'm concerned I haven't caught all of the cases in which
adjusting REG_N_SETS would be needed: we might have dropped multiple
SETs of the same pseudo, each with its own REG_UNUSED note (say, from
all of i3, i2, i1, and i0), and the current logic will only decrement
REG_N_SETS once, and only if i2 no longer sets the register.

I'm also concerned that the logic for when the REG is set or referenced
in i3 is incomplete: references in i3 might have come from any of the
previous insns, even if intervening sets of the same register were
substituted and thus removed.  Consider the following nightmarish scenario:

i0: (parallel [(set (reg CC) (op1 (reg A)))
               (set (reg A) (plus (reg A) (const_int 1)))])
    (REG_UNUSED (reg A))
i1: (set (reg A) (ne (reg CC) (const_int 0)))
    (REG_DEAD (reg CC))
i2: (parallel [(set (reg CC) (op2 (reg A)))
               (set (reg A) (plus (reg A) (const_int 1)))])
    (REG_UNUSED (reg A)))
i3: (set (reg A) (eq (reg CC) (const_int 0)))
    (REG_DEAD (reg CC))

we might turn that into say:

i2: (set (reg CC) (op3 (reg A)))
    (REG_DEAD (reg A))
i3: (set (reg A) (op4 (reg CC)))
    (REG_DEAD (reg CC))

and now we'd have to somehow figure out that we're to discount the two
unused sets of reg A, those from i0 and i2, and to turn either
REG_UNUSED note about reg A into a REG_DEAD note to be placed at i2.  A
is set at i3, so combine should record its new value, but if it's
computed in terms of the scratch CC and a much-older A, will we get the
correct value?  Or is the value unchanged because it's the output of the
latest insn?

Now, consider this slightly simpler scenario (trying to combine only the
first 3 insns):

i0: nil
i1: (parallel [(set (reg CC) (op1 (reg A)))
               (set (reg A) (plus (reg A) (const_int 1)))])
    (REG_UNUSED (reg A))
i2: (set (reg A) (ne (reg CC) (const_int 0)))
    (REG_DEAD (reg CC))
i3: (parallel [(set (reg CC) (op2 (reg A)))
               (set (reg A) (plus (reg A) (const_int 1)))])
    (REG_UNUSED (reg A)))

this might combine into:

i2: (set (reg A) (op5 (reg A)))
i3: (set (reg CC) (op6 (reg A)))
    (REG_DEAD (reg A))

and now we have removed 3 sets to A, but added 1 by splitting within
combine using A as scratch.  Would we then have to figure out that for
each of the REG_UNUSED notes pertaining to A we have to drop the
REG_N_SETS count by 1, although A remains used in i3, and set and used
in i2?  I don't see how.

I see that combine would record the value for reg A at i2 in this case,
but would it express it in terms of which earlier value of reg A?
Shouldn't we have reset it while placing notes in this case too?

> It wasn't obvious to me (this code is horribly complicated).  Whether
> all existing code is correct...  it's probably best not to look too
> closely :-/

You're right about its being horribly complicated.

Maybe I should go about it incrementally.

> If you have a patch you feel confident in, could you post it again
> please?

So let me tell you how I feel about this.  It has waited long enough,
and there are at least 3 bugs known to be fixed by the first very simple
patch below.  The catch is that it doesn't adjust REG_N_SETS at all (we
didn't before the patch, and that didn't seem to hurt too much).  I've
regstrapped this successfully on x86_64-linux-gnu and i686-linux-gnu.

--->cut<---
When combine drops a REG_UNUSED SET in a parallel, we have to clear
cached values, so that, even if the REGs remain used (e.g. because
they were referenced in the used SET_SRC), we will not use properties
of the dropped modified value as if they applied to the preserved
original one.

We fail to adjust REG_N_SETS.

for  gcc/ChangeLog

	PR rtl-optimization/80693
	PR rtl-optimization/81019
	PR rtl-optimization/81020
	* combine.c (distribute_notes): Reset any REG_UNUSED REGs that
	are not mentioned in i3.  Place the REG_UNUSED note on i2,
	possibly modified to REG_DEAD, if it did not originate in i3.

for  gcc/testsuite/ChangeLog

	PR rtl-optimization/80693
	PR rtl-optimization/81019
	PR rtl-optimization/81020
	* gcc.dg/pr80693.c: New.
	* gcc.dg/pr81019.c: New.
---
 gcc/combine.c                  |   40 ++++++++++++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/pr80693.c |   26 ++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/pr81019.c |   27 +++++++++++++++++++++++++++
 3 files changed, 93 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr80693.c
 create mode 100644 gcc/testsuite/gcc.dg/pr81019.c

diff --git a/gcc/combine.c b/gcc/combine.c
index 863abd7a282b..976b5f2e519b 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14254,6 +14254,46 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
 	      PUT_REG_NOTE_KIND (note, REG_DEAD);
 	      place = i3;
 	    }
+
+	  /* A SET or CLOBBER of the REG_UNUSED reg has been removed,
+	     but we can't tell which at this point.  We must reset any
+	     expectations we had about the value that was previously
+	     stored in the reg.  ??? Ideally, we'd adjust REG_N_SETS
+	     and, if appropriate, restore its previous value, but we
+	     don't have enough information for that at this point.  */
+	  else
+	    {
+	      record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
+
+	      /* Otherwise, if this register is now referenced in i2
+		 then the register used to be modified in one of the
+		 original insns.  If it was i3 (say, in an unused
+		 parallel), it's now completely gone, so the note can
+		 be discarded.  But if it was modified in i2, i1 or i0
+		 and we still reference it in i2, then we're
+		 referencing the previous value, and since the
+		 register was modified and REG_UNUSED, we know that
+		 the previous value is now dead.  So, if we only
+		 reference the register in i2, we change the note to
+		 REG_DEAD, to reflect the previous value.  However, if
+		 we're also setting or clobbering the register as
+		 scratch, we know (because the register was not
+		 referenced in i3) that it's unused, just as it was
+		 unused before, and we place the note in i2.  */
+	      if (from_insn != i3 && i2 && INSN_P (i2)
+		  && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
+		{
+		  if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
+		    PUT_REG_NOTE_KIND (note, REG_DEAD);
+		  if (! (REG_P (XEXP (note, 0))
+			 ? find_regno_note (i2, REG_NOTE_KIND (note),
+					    REGNO (XEXP (note, 0)))
+			 : find_reg_note (i2, REG_NOTE_KIND (note),
+					  XEXP (note, 0))))
+		    place = i2;
+		}
+	    }
+
 	  break;
 
 	case REG_EQUAL:
diff --git a/gcc/testsuite/gcc.dg/pr80693.c b/gcc/testsuite/gcc.dg/pr80693.c
new file mode 100644
index 000000000000..507177167e58
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr80693.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-coalesce-vars" } */
+typedef unsigned char u8;
+typedef unsigned short u16;
+typedef unsigned u32;
+typedef unsigned long long u64;
+
+static u64 __attribute__((noinline, noclone))
+foo(u8 u8_0, u16 u16_0, u32 u32_0, u64 u64_0,  u16 u16_1)
+{
+  u16_1 += 0x1051;
+  u16_1 &= 1;
+  u8_0 <<= u32_0 & 7;
+  u16_0 -= !u16_1;
+  u16_1 >>= ((u16)-u8_0 != 0xff);
+  return u8_0 + u16_0 + u64_0 + u16_1;
+}
+
+int
+main (void)
+{
+  u64 x = foo(1, 1, 0xffff, 0, 1);
+  if (x != 0x80)
+    __builtin_abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/pr81019.c b/gcc/testsuite/gcc.dg/pr81019.c
new file mode 100644
index 000000000000..cf13bfa92758
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr81019.c
@@ -0,0 +1,27 @@
+/* PR rtl-optimization/81019 */
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-ccp" } */
+
+unsigned long long __attribute__((noinline, noclone))
+foo (unsigned char a, unsigned short b, unsigned c, unsigned long long d,
+     unsigned char e, unsigned short f, unsigned g, unsigned long long h)
+{
+  g = e;
+  c &= 0 < d;
+  b *= d;
+  g ^= -1;
+  g &= 1;
+  c |= 1;
+  a -= 0 < g;
+  g >>= 1;
+  f = b | (f >> b);
+  return a + c + d + f + g + h;
+}
+
+int
+main (void)
+{
+  if (foo (0, 0, 0, 0, 0, 0, 0, 0) != 0x100)
+    __builtin_abort ();
+  return 0;
+}
--->cut<---


But then it occurred to me that the REG_UNUSED register from i1 might
have *also* been set in i2, and we might then have used it as a scratch
register for split, while the set that the REG_UNUSED pertained to might
have been completely discarded from a parallel in i1 or i0.  And then I
wasn't unsure whether or not to decrement REG_N_SETS.

diff --git a/gcc/combine.c b/gcc/combine.c
index 863abd7a282b..277492af9108 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14254,6 +14254,59 @@ distribute_notes (rtx notes, rtx_insn *from_insn, rtx_insn *i3, rtx_insn *i2,
 	      PUT_REG_NOTE_KIND (note, REG_DEAD);
 	      place = i3;
 	    }
+
+	  /* A SET or CLOBBER of the REG_UNUSED reg has been removed,
+	     so we we must reset any expectations we had about the
+	     value that was previously stored in the reg, and adjust
+	     REG_N_SETS.  Ideally, we'd restore the register to its
+	     previous value, but we don't have enough information for
+	     that at this point.  */
+	  else
+	    {
+	      record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
+
+	      bool set_p = false;
+
+	      /* Otherwise, if this register is now referenced in i2
+		 then the register used to be modified in one of the
+		 original insns.  If it was i3 (say, in an unused
+		 parallel), it's now completely gone, so the note can
+		 be discarded.  But if it was modified in i2, i1 or i0
+		 and we still reference it in i2, then we're
+		 referencing the previous value, and since the
+		 register was modified and REG_UNUSED, we know that
+		 the previous value is now dead.  So, if we only
+		 reference the register in i2, we change the note to
+		 REG_DEAD, to reflect the previous value.  However, if
+		 we're also setting or clobbering the register as
+		 scratch, we know (because the register was not
+		 referenced in i3) that it's unused, just as it was
+		 unused before, and we place the note in i2.  */
+	      if (from_insn != i3 && i2 && INSN_P (i2))
+		{
+		  set_p = reg_set_p (XEXP (note, 0), PATTERN (i2));
+
+		  if (set_p
+		      || reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
+		    {
+		      if (!set_p)
+			PUT_REG_NOTE_KIND (note, REG_DEAD);
+
+		      if (! (REG_P (XEXP (note, 0))
+			     ? find_regno_note (i2, REG_NOTE_KIND (note),
+						REGNO (XEXP (note, 0)))
+			     : find_reg_note (i2, REG_NOTE_KIND (note),
+					      XEXP (note, 0))))
+			place = i2;
+		    }
+		}
+
+	      /* If we no longer have a set, drop the set count.  */
+	      if (!set_p && REG_P (XEXP (note, 0))
+		  && REGNO (XEXP (note, 0)) < reg_n_sets_max)
+		INC_REG_N_SETS (REGNO (XEXP (note, 0)), -1);
+	    }
+
 	  break;
 
 	case REG_EQUAL:


Then I realized we might have to also reset the cached value when it's
referenced in i3, and possibly also adjust the counts.  And also some of
that when it's set in i3.  Then I lost it.  Help? :-)

FWIW, I'd be glad just installing the patch between --->cut<---s, or
even a simpler version thereof that just resets the recorded value and
doesn't even attempt to place notes in i2, to get the bugs fixed while
we sort out the really tricky issues of adjusting REG_N_SETS.  The
(incomplete, but functional) fix has been known for a while, and we
shouldn't subject users to wrong code when we can help it, even if we
might miss optimization opportunities for that, right?  Thoughts?

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]